TY - GEN
T1 - IndoBERT-Based Ensemble Learning for Multi-Level Multi-Label Hate Speech Detection in Indonesian Social Media
AU - Rokhim, Imam Fadhkur
AU - Sarno, Riyanarto
AU - Septiyanto, Abdullah Faqih
AU - Haryono, Agus Tri
AU - Sabilla, Shoffi Izza
N1 - Publisher Copyright:
© 2024 IEEE.
PY - 2024
Y1 - 2024
N2 - Hate speech on social media platforms has become a pressing issue, with harmful content often leading to social tensions and emotional harm. In Indonesia, the complex linguistic and cultural context of online discourse presents additional challenges for effective hate speech detection. This study addresses these challenges by presenting an ensemble learning approach for hate speech detection in Indonesian social media. Leveraging IndoBERT for language understanding and combining it with Bi-LSTM and Bi-GRU models for sequence processing, we developed a robust multi-model architecture that effectively captures linguistic patterns and contextual nuances unique to Indonesian. The proposed ensemble framework was tested on a comprehensive dataset with multiple hate speech labels, including categories such as Religion, Race, Gender, and Severity. Experimental results demonstrate that the ensemble model achieved an accuracy of 86% and an Fl-score of 63%, significantly outperforming individual models across most categories. This approach highlights the potential of ensemble learning for automated content moderation in Indonesian social media, providing a promising solution for managing diverse forms of online hate speech.
AB - Hate speech on social media platforms has become a pressing issue, with harmful content often leading to social tensions and emotional harm. In Indonesia, the complex linguistic and cultural context of online discourse presents additional challenges for effective hate speech detection. This study addresses these challenges by presenting an ensemble learning approach for hate speech detection in Indonesian social media. Leveraging IndoBERT for language understanding and combining it with Bi-LSTM and Bi-GRU models for sequence processing, we developed a robust multi-model architecture that effectively captures linguistic patterns and contextual nuances unique to Indonesian. The proposed ensemble framework was tested on a comprehensive dataset with multiple hate speech labels, including categories such as Religion, Race, Gender, and Severity. Experimental results demonstrate that the ensemble model achieved an accuracy of 86% and an Fl-score of 63%, significantly outperforming individual models across most categories. This approach highlights the potential of ensemble learning for automated content moderation in Indonesian social media, providing a promising solution for managing diverse forms of online hate speech.
KW - Bi-GRU
KW - Bi-LSTM
KW - Hate speech detection
KW - IndoBERT
KW - ensemble learning
KW - multi-level multi-label classification
UR - https://www.scopus.com/pages/publications/105003160410
U2 - 10.1109/BTS-I2C63534.2024.10942204
DO - 10.1109/BTS-I2C63534.2024.10942204
M3 - Conference contribution
AN - SCOPUS:105003160410
T3 - 2024 Beyond Technology Summit on Informatics International Conference, BTS-I2C 2024
SP - 456
EP - 461
BT - 2024 Beyond Technology Summit on Informatics International Conference, BTS-I2C 2024
A2 - Wibowo, Ferry Wahyu
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2024 Beyond Technology Summit on Informatics International Conference, BTS-I2C 2024
Y2 - 19 December 2024
ER -