TY - GEN
T1 - Detection of Potentially Students Drop out of College in Case of Missing Value Using C4.5
AU - Mutrofin, Siti
AU - Khalimi, Abdul Muiz
AU - Kurniawan, Eddy
AU - Ginardi, Raden Venantius Hari
AU - Fatichah, Chastine
AU - Sari, Yuita Arum
N1 - Publisher Copyright:
© 2019 IEEE.
PY - 2019/8
Y1 - 2019/8
N2 - The reputation of a university can be determined by the number of students drop out. This problem was experienced by many universities in Indonesia. It has been conducted by many researchers, however the data acquisition, attributes were not well explained. This study is aiming for giving projection related to the reasons behind students drop out by using machine learning technique. The challenging phase of preprocessing primary datasets are missing value, balanced class distribution, and a variety of data types. Two classes are applied: drop out and graduate students. By analyzing the problem of missing value data, it can reflect the basis of why students drop out or students who have the potential to drop out. According to the problem of balanced class distribution, Decision Tree algorithm is utilized, meanwhile for tackling the various of data types, we use C4.5. The result shows that 20 attributes using stratified sampling is the best of among all datasets and experimentations with an average AUC, accuracy, precision, and recall values of 0.98, 96.87, 98.75, and 97.84 respectively. It indicates that the proposed method is suitable for predicting students drop out with a balanced case of class distribution, despite having a missing data value problem.
AB - The reputation of a university can be determined by the number of students drop out. This problem was experienced by many universities in Indonesia. It has been conducted by many researchers, however the data acquisition, attributes were not well explained. This study is aiming for giving projection related to the reasons behind students drop out by using machine learning technique. The challenging phase of preprocessing primary datasets are missing value, balanced class distribution, and a variety of data types. Two classes are applied: drop out and graduate students. By analyzing the problem of missing value data, it can reflect the basis of why students drop out or students who have the potential to drop out. According to the problem of balanced class distribution, Decision Tree algorithm is utilized, meanwhile for tackling the various of data types, we use C4.5. The result shows that 20 attributes using stratified sampling is the best of among all datasets and experimentations with an average AUC, accuracy, precision, and recall values of 0.98, 96.87, 98.75, and 97.84 respectively. It indicates that the proposed method is suitable for predicting students drop out with a balanced case of class distribution, despite having a missing data value problem.
KW - Balanced Class Distribution
KW - C4.5
KW - Drop out
KW - Missing value
UR - http://www.scopus.com/inward/record.url?scp=85075936894&partnerID=8YFLogxK
U2 - 10.1109/ICSECC.2019.8907014
DO - 10.1109/ICSECC.2019.8907014
M3 - Conference contribution
AN - SCOPUS:85075936894
T3 - ICSECC 2019 - International Conference on Sustainable Engineering and Creative Computing: New Idea, New Innovation, Proceedings
SP - 349
EP - 354
BT - ICSECC 2019 - International Conference on Sustainable Engineering and Creative Computing
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2019 International Conference on Sustainable Engineering and Creative Computing, ICSECC 2019
Y2 - 20 August 2019 through 22 August 2019
ER -