TY - GEN
T1 - Performance Analysis of Random Forest with Sampling for River Water Quality Classification
AU - Fadhilah, Rahmi
AU - Kuswanto, Heri
AU - Prastyo, Dedy Dwi
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - The availability of quality water is essential for the survival of organisms on Earth. However, monitoring and managing the environment surrounding the water source is crucial to ensure water is safe for consumption. While classification methods are commonly used to assess water quality, challenges arise when data is imbalanced between classes. Data resampling techniques, including individual and combination, are often used to address this imbalance. This study aims to evaluate the performance of the Random Forest algorithm in classifying river water quality after resampling RACOG, RUS, and RACOG-RUS data, both with and without feature selection. Data derived from the Provincial Environment Office was evaluated using various performance metrics. Our findings show that the Random Forest model, especially with individual RACOG resampling, exhibits the most promising performance, while the RUS model shows less than optimal performance. Interestingly, the combined RACOG-RUS approach effectively addresses the class imbalance problem. Moreover, feature selection contributes to improved performance in both the RUS and RACOG-RUS models, with the RACOG model showing consistent performance. Notably, the RACOG-RUS model consistently excels in prediction accuracy and stability. This research confirms the adoption of a combination approach to address river water quality classification data imbalance.
AB - The availability of quality water is essential for the survival of organisms on Earth. However, monitoring and managing the environment surrounding the water source is crucial to ensure water is safe for consumption. While classification methods are commonly used to assess water quality, challenges arise when data is imbalanced between classes. Data resampling techniques, including individual and combination, are often used to address this imbalance. This study aims to evaluate the performance of the Random Forest algorithm in classifying river water quality after resampling RACOG, RUS, and RACOG-RUS data, both with and without feature selection. Data derived from the Provincial Environment Office was evaluated using various performance metrics. Our findings show that the Random Forest model, especially with individual RACOG resampling, exhibits the most promising performance, while the RUS model shows less than optimal performance. Interestingly, the combined RACOG-RUS approach effectively addresses the class imbalance problem. Moreover, feature selection contributes to improved performance in both the RUS and RACOG-RUS models, with the RACOG model showing consistent performance. Notably, the RACOG-RUS model consistently excels in prediction accuracy and stability. This research confirms the adoption of a combination approach to address river water quality classification data imbalance.
KW - Classification
KW - Imbalance
KW - RACOG
KW - RACOG-RUS
KW - RUS
KW - Random Forest
UR - http://www.scopus.com/inward/record.url?scp=85202868088&partnerID=8YFLogxK
U2 - 10.1109/ICICoS62600.2024.10636858
DO - 10.1109/ICICoS62600.2024.10636858
M3 - Conference contribution
AN - SCOPUS:85202868088
T3 - Proceedings - International Conference on Informatics and Computational Sciences
SP - 456
EP - 461
BT - 2024 7th International Conference on Informatics and Computational Sciences, ICICoS 2024
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Informatics and Computational Sciences, ICICoS 2024
Y2 - 17 July 2024 through 18 July 2024
ER -