TY - JOUR
T1 - Educational Data Mining Clustering Approach
T2 - Case Study of Undergraduate Student Thesis Topic
AU - Andre,
AU - Suciati, Nanik
AU - Fabroyir, Hadziq
AU - Pardede, Eric
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2023
Y1 - 2023
N2 - This study aims to investigate the potential of educational data mining (EDM) in addressing the issue of delayed completion in undergraduate student thesis courses. Delayed completion of these courses is a common issue that affects both students and higher education institutions. This study employed clustering analysis to create clusters of thesis topics. The research model was constructed using expert labeling to assign each thesis title to a computer science ontology standard. Cross-referencing was employed to associate supporting courses with each thesis title, resulting in a labeled dataset with three supporting courses for each thesis title. This study analyzed five different clustering algorithms, including K-Means, DBScan, BIRCH, Gaussian Mixture, and Mean Shift, to identify the best approach for analyzing undergraduate thesis data. The results demonstrated that k-means clustering is the most efficient method, generating five distinct clusters with unique characteristics. Furthermore, this study investigated the correlation between educational data, specifically GPA, and the average grades of courses that support a thesis title and the duration of thesis completion. Our investigation revealed a moderate correlation between GPA, thesis-supporting course average grades, and the time to complete the thesis, with higher academic performance being associated with shorter completion times. These moderate results indicate the need for further studies to explore additional factors beyond GPA and the average grades of thesis-supporting courses that contribute to delays in thesis completion. This study contributes to the understanding and evaluation of educational outcomes within study programs, as defined in the curriculum, particularly concerning the design and implementation of thesis topics. Additionally, the clustering results serve as a foundation for future research and offer valuable insights into the potential of EDM techniques to assist in selecting appropriate thesis topics, thereby reducing the risk of delayed completion.
AB - This study aims to investigate the potential of educational data mining (EDM) in addressing the issue of delayed completion in undergraduate student thesis courses. Delayed completion of these courses is a common issue that affects both students and higher education institutions. This study employed clustering analysis to create clusters of thesis topics. The research model was constructed using expert labeling to assign each thesis title to a computer science ontology standard. Cross-referencing was employed to associate supporting courses with each thesis title, resulting in a labeled dataset with three supporting courses for each thesis title. This study analyzed five different clustering algorithms, including K-Means, DBScan, BIRCH, Gaussian Mixture, and Mean Shift, to identify the best approach for analyzing undergraduate thesis data. The results demonstrated that k-means clustering is the most efficient method, generating five distinct clusters with unique characteristics. Furthermore, this study investigated the correlation between educational data, specifically GPA, and the average grades of courses that support a thesis title and the duration of thesis completion. Our investigation revealed a moderate correlation between GPA, thesis-supporting course average grades, and the time to complete the thesis, with higher academic performance being associated with shorter completion times. These moderate results indicate the need for further studies to explore additional factors beyond GPA and the average grades of thesis-supporting courses that contribute to delays in thesis completion. This study contributes to the understanding and evaluation of educational outcomes within study programs, as defined in the curriculum, particularly concerning the design and implementation of thesis topics. Additionally, the clustering results serve as a foundation for future research and offer valuable insights into the potential of EDM techniques to assist in selecting appropriate thesis topics, thereby reducing the risk of delayed completion.
KW - Computing classification system
KW - clustering analysis
KW - k-means
KW - ontology
KW - undergraduate thesis
UR - http://www.scopus.com/inward/record.url?scp=85177042561&partnerID=8YFLogxK
U2 - 10.1109/ACCESS.2023.3332818
DO - 10.1109/ACCESS.2023.3332818
M3 - Article
AN - SCOPUS:85177042561
SN - 2169-3536
VL - 11
SP - 130072
EP - 130088
JO - IEEE Access
JF - IEEE Access
ER -