Abstract

The general objective of this study is to help universities find the most influence factors which causes students drop out. The specific objective is to find the precise algorithm to predict dropout student in balanced class distribution case. Dataset was obtained from academic information system of a University in East Java, Indonesia. Data taken between 2009-2015 consists of 32 attributes, 425 data, and 2 classes. Type of data attributes are nominal and numerical. The results of this study state that the most influence factors which causes students to drop out are lecture programme; number of courses; credit amount in semester 3; credit amount in semester 6; credit amount in semester 9; Grade Point Average (GPA) in semester 2; GPA in semester 3; GPA in semester 4; and GPA in semester 6. Random Forest algorithm with gain ratio criteria parameter and shuffled sample method has the best performance, namely 99.29%, 99.47%, 9.09%, 99.28%, 0.71%, and 0.999 for accuracy, precision, recall, f-measure, classification error, and Area Under Curve (AUC), respectively. While the worst performance algorithm is Decision Tree with linear sampling method and information gain criteria, namely 83.19%, 83.47%, 86.32%, 84.87%, 16.81%, and AUC 0.3 for accuracy, precision, recall, f-measure, classification error, and AUC, respectively.

Original languageEnglish
Pages1764-1770
Number of pages7
Volume81
No.11-12
Specialist publicationTest Engineering and Management
Publication statusPublished - 2019

Keywords

  • Balanced class distribution
  • Classification
  • Decision tree
  • Dropout
  • Educational data mining
  • Random forest

Fingerprint

Dive into the research topics of 'A critical assessment of balanced class distribution problems: The case of predict student dropout'. Together they form a unique fingerprint.

Cite this