TY - GEN
T1 - Software Defect Prediction Using a Combination of Oversampling and Undersampling Methods
AU - Iswafaza, Aizul Faiz
AU - Rochimah, Siti
N1 - Publisher Copyright:
© 2022 IEEE.
PY - 2022
Y1 - 2022
N2 - Software quality can be improved by doing software testing, but the more features are developed the more resources are required, therefore software defect prediction (SDP) is introduced. Various kinds of machine learning methods are used to develop SDP. However, various kinds of problems arise in SDP activities, namely data redundancy, class imbalance and feature redundancy. In this study, a combination of oversampling and under-sampling (COU) model will be proposed to solve the problem of data redundancy and class imbalance. The oversampling method used is RSMOTE and the under-sampling method used is ENN. The application of the combination model will later provide a new set of datasets that are more balanced and cleaner from ambiguous, noisy and duplication of data. From the new data generated by the model, deep learning will then be applied as a prediction model. And the evaluation will be done by applying the f-measure measurement. The results of this study indicate that the COU model used gives good results in improving the quality of SDP. When compared with the average value generated by the RSMOTE model in making predictions, the COU model provides an increase in f-measure evaluation results by 11% where the average value obtained is 0.876.
AB - Software quality can be improved by doing software testing, but the more features are developed the more resources are required, therefore software defect prediction (SDP) is introduced. Various kinds of machine learning methods are used to develop SDP. However, various kinds of problems arise in SDP activities, namely data redundancy, class imbalance and feature redundancy. In this study, a combination of oversampling and under-sampling (COU) model will be proposed to solve the problem of data redundancy and class imbalance. The oversampling method used is RSMOTE and the under-sampling method used is ENN. The application of the combination model will later provide a new set of datasets that are more balanced and cleaner from ambiguous, noisy and duplication of data. From the new data generated by the model, deep learning will then be applied as a prediction model. And the evaluation will be done by applying the f-measure measurement. The results of this study indicate that the COU model used gives good results in improving the quality of SDP. When compared with the average value generated by the RSMOTE model in making predictions, the COU model provides an increase in f-measure evaluation results by 11% where the average value obtained is 0.876.
KW - AEEEM
KW - RSMOTE
KW - combined oversampling and under-sampling
KW - edited nearest neighbors
KW - software defect prediction
UR - http://www.scopus.com/inward/record.url?scp=85150468951&partnerID=8YFLogxK
U2 - 10.1109/ICITISEE57756.2022.10057798
DO - 10.1109/ICITISEE57756.2022.10057798
M3 - Conference contribution
AN - SCOPUS:85150468951
T3 - Proceeding - 6th International Conference on Information Technology, Information Systems and Electrical Engineering: Applying Data Sciences and Artificial Intelligence Technologies for Environmental Sustainability, ICITISEE 2022
SP - 127
EP - 132
BT - Proceeding - 6th International Conference on Information Technology, Information Systems and Electrical Engineering
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 6th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2022
Y2 - 13 December 2022 through 14 December 2022
ER -