TY - GEN
T1 - Entropy-Based Fuzzy Weighted Logistic Regression for Classifying Imbalanced Data
AU - Harumeka, Ajiwasesa
AU - Purnami, Santi Wulan
AU - Rahayu, Santi Puteri
N1 - Publisher Copyright:
© 2021, Springer Nature Singapore Pte Ltd.
PY - 2021
Y1 - 2021
N2 - Logistic regression is a popular classification method that has disadvantages when it is applied to large data. Truncated Regularized Iteratively Reweighted Least Square (TR-IRLS) is a method that overcomes this problem. This method is similar to Support Vector Machine (SVM) because both of them have similar loss functions and parameters that can adjust the bias and variance. Both methods were designed with the assumption of balanced data, so that they are not suitable to be applied on imbalanced data. Both methods were developed to overcome problem on imbalanced data. TR-IRLS was developed into Rare Event Weighted Logistic Regression (RE-WLR) and SVM was developed into Fuzzy Support Vector Machine (FSVM). Both RE-WLR and FSVM use weights based on class differences, so that RE-WLR had better performance than TR-IRLS on imbalanced data whereas FSVM was better than SVM. Then, Entropy-based Fuzzy Support Vector Machine (EFSVM) was developed by obtaining weighting values not only based on class differences, but also based on entropy. EFSVM further enhanced minority class interest in imbalanced data than SVM and even FSVM. Therefore, Entropy-based Fuzzy Weighted Logistic Regression (EFWLR) is proposed by adopting the success of Entropy-based Fuzzy Membership (EF) as weight on SVM. This study applied EF as weight on Weighted Logistic Regression for binary classification. Experiments on 20 simulation data and 5 benchmark data with various rarity schemes validated that the EFWLR outperformed TR-IRLS and RE-WLR based on AUC. EFWLR had more efficient AUC than RE-WLR on imbalanced data.
AB - Logistic regression is a popular classification method that has disadvantages when it is applied to large data. Truncated Regularized Iteratively Reweighted Least Square (TR-IRLS) is a method that overcomes this problem. This method is similar to Support Vector Machine (SVM) because both of them have similar loss functions and parameters that can adjust the bias and variance. Both methods were designed with the assumption of balanced data, so that they are not suitable to be applied on imbalanced data. Both methods were developed to overcome problem on imbalanced data. TR-IRLS was developed into Rare Event Weighted Logistic Regression (RE-WLR) and SVM was developed into Fuzzy Support Vector Machine (FSVM). Both RE-WLR and FSVM use weights based on class differences, so that RE-WLR had better performance than TR-IRLS on imbalanced data whereas FSVM was better than SVM. Then, Entropy-based Fuzzy Support Vector Machine (EFSVM) was developed by obtaining weighting values not only based on class differences, but also based on entropy. EFSVM further enhanced minority class interest in imbalanced data than SVM and even FSVM. Therefore, Entropy-based Fuzzy Weighted Logistic Regression (EFWLR) is proposed by adopting the success of Entropy-based Fuzzy Membership (EF) as weight on SVM. This study applied EF as weight on Weighted Logistic Regression for binary classification. Experiments on 20 simulation data and 5 benchmark data with various rarity schemes validated that the EFWLR outperformed TR-IRLS and RE-WLR based on AUC. EFWLR had more efficient AUC than RE-WLR on imbalanced data.
KW - Binary classification
KW - Entropy-based Fuzzy
KW - Imbalanced Data
KW - Weighted Logistic Regression
UR - http://www.scopus.com/inward/record.url?scp=85119419070&partnerID=8YFLogxK
U2 - 10.1007/978-981-16-7334-4_23
DO - 10.1007/978-981-16-7334-4_23
M3 - Conference contribution
AN - SCOPUS:85119419070
SN - 9789811673337
T3 - Communications in Computer and Information Science
SP - 312
EP - 327
BT - Soft Computing in Data Science - 6th International Conference, SCDS 2021, Proceedings
A2 - Mohamed, Azlinah
A2 - Yap, Bee Wah
A2 - Zain, Jasni Mohamad
A2 - Berry, Michael W.
PB - Springer Science and Business Media Deutschland GmbH
T2 - 6th International Conference on Soft Computing in Data Science, SCDS 2021
Y2 - 2 November 2021 through 3 November 2021
ER -