TY - GEN
T1 - Fraud Detection in Indonesian Administrative Health Records using Cluster-Based Oversampling Methods
AU - Priambodo, Tegar Ganang Satrio
AU - Rachmadi, Hilmi Zharfan
AU - Radam, Fajra Hanifa Nuridi
AU - Simanihuruk, Laurensia
AU - Purwitasari, Diana
N1 - Publisher Copyright:
© 2025 IEEE.
PY - 2025
Y1 - 2025
N2 - This study aims to improve healthcare fraud detection in BPJS Kesehatan's claims verification, where mismatches between billed amounts and INACBGs rates often cause underpayment and financial strain on providers. A key challenge is class imbalance in fraud datasets, limiting conventional detection methods. While prior work used oversampling like SMOTE and ROS-which often generate noisy samples-this study introduces a cluster-based oversampling framework preserving claims data distribution. It combines six cluster-guided techniques (AgglomerativeROS, AgglomerativeSMOTE, DBSCANROS, DBSCANSMOTE, KMeansROS, KMeansSMOTE) with ensemble learning (Decision Tree, Random Forest, Balanced Random Forest, Gradient Boosting, CatBoost). The CatBoost model with KMeansROS achieved strong results (AUC-PRC: 0.93924, precision: 0.85714, recall: 0.92308), improving recall by 19.3%, benefiting fraud detection and financing sustainability.
AB - This study aims to improve healthcare fraud detection in BPJS Kesehatan's claims verification, where mismatches between billed amounts and INACBGs rates often cause underpayment and financial strain on providers. A key challenge is class imbalance in fraud datasets, limiting conventional detection methods. While prior work used oversampling like SMOTE and ROS-which often generate noisy samples-this study introduces a cluster-based oversampling framework preserving claims data distribution. It combines six cluster-guided techniques (AgglomerativeROS, AgglomerativeSMOTE, DBSCANROS, DBSCANSMOTE, KMeansROS, KMeansSMOTE) with ensemble learning (Decision Tree, Random Forest, Balanced Random Forest, Gradient Boosting, CatBoost). The CatBoost model with KMeansROS achieved strong results (AUC-PRC: 0.93924, precision: 0.85714, recall: 0.92308), improving recall by 19.3%, benefiting fraud detection and financing sustainability.
KW - cluster-based oversampling
KW - ensemble learning
KW - fraud detection
KW - health insurance
UR - https://www.scopus.com/pages/publications/105012761957
U2 - 10.1109/SIML65326.2025.11081135
DO - 10.1109/SIML65326.2025.11081135
M3 - Conference contribution
AN - SCOPUS:105012761957
T3 - 2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
BT - 2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2025 International Conference on Smart Computing, IoT and Machine Learning, SIML 2025
Y2 - 3 June 2025 through 4 June 2025
ER -