TY - GEN
T1 - A critical assessment of balanced class distribution problems
T2 - The case of predict student dropout
AU - Mutrofin, Siti
AU - Ginardi, Raden Venantius Hari
AU - Fatichah, Chastine
AU - Kurniawardhani, Arrie
N1 - Publisher Copyright:
© 2019 Mattingley Publishing. All rights reserved.
PY - 2019
Y1 - 2019
N2 - The general objective of this study is to help universities find the most influence factors which causes students drop out. The specific objective is to find the precise algorithm to predict dropout student in balanced class distribution case. Dataset was obtained from academic information system of a University in East Java, Indonesia. Data taken between 2009-2015 consists of 32 attributes, 425 data, and 2 classes. Type of data attributes are nominal and numerical. The results of this study state that the most influence factors which causes students to drop out are lecture programme; number of courses; credit amount in semester 3; credit amount in semester 6; credit amount in semester 9; Grade Point Average (GPA) in semester 2; GPA in semester 3; GPA in semester 4; and GPA in semester 6. Random Forest algorithm with gain ratio criteria parameter and shuffled sample method has the best performance, namely 99.29%, 99.47%, 9.09%, 99.28%, 0.71%, and 0.999 for accuracy, precision, recall, f-measure, classification error, and Area Under Curve (AUC), respectively. While the worst performance algorithm is Decision Tree with linear sampling method and information gain criteria, namely 83.19%, 83.47%, 86.32%, 84.87%, 16.81%, and AUC 0.3 for accuracy, precision, recall, f-measure, classification error, and AUC, respectively.
AB - The general objective of this study is to help universities find the most influence factors which causes students drop out. The specific objective is to find the precise algorithm to predict dropout student in balanced class distribution case. Dataset was obtained from academic information system of a University in East Java, Indonesia. Data taken between 2009-2015 consists of 32 attributes, 425 data, and 2 classes. Type of data attributes are nominal and numerical. The results of this study state that the most influence factors which causes students to drop out are lecture programme; number of courses; credit amount in semester 3; credit amount in semester 6; credit amount in semester 9; Grade Point Average (GPA) in semester 2; GPA in semester 3; GPA in semester 4; and GPA in semester 6. Random Forest algorithm with gain ratio criteria parameter and shuffled sample method has the best performance, namely 99.29%, 99.47%, 9.09%, 99.28%, 0.71%, and 0.999 for accuracy, precision, recall, f-measure, classification error, and Area Under Curve (AUC), respectively. While the worst performance algorithm is Decision Tree with linear sampling method and information gain criteria, namely 83.19%, 83.47%, 86.32%, 84.87%, 16.81%, and AUC 0.3 for accuracy, precision, recall, f-measure, classification error, and AUC, respectively.
KW - Balanced class distribution
KW - Classification
KW - Decision tree
KW - Dropout
KW - Educational data mining
KW - Random forest
UR - http://www.scopus.com/inward/record.url?scp=85077213246&partnerID=8YFLogxK
M3 - Article
AN - SCOPUS:85077213246
SN - 0193-4120
VL - 81
SP - 1764
EP - 1770
JO - Test Engineering and Management
JF - Test Engineering and Management
ER -