TY - GEN
T1 - Comparison of Deep Learning Methods in Detecting Hate Speech in Indonesian Tweets
AU - Brata, Dwija Wisnu
AU - Djunaidy, Arif
AU - Siahaan, Daniel Oranova
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/10/24
Y1 - 2023/10/24
N2 - Hate speech has negative effects on both the targeted victims and the listeners. The dissemination of hate speech can occur not only physically or verbally, but also in writing on social media. The emergence of hate speech on social media platforms can be difficult to identify in written communication. Currently, hate speech detection relies on machine learning. This study generates a vector representation of words using three pre-trained word insertion models: Global Vectors (GloVe), FastText, and Bidirectional Encoder Representations from Transformers (BERT). Synthetic Minority Oversampling Technique (SMOTE) and Random Over Sampling (ROS) were utilized as balancing methods to rectify data imbalance between classes. In addition, three distinct deep learning architectures were used to identify sentence-level hate speech in Indonesian tweets: Bidirectional Long Sort-Term Memory (BiLSTM), Convolution Neural Network (CNN), and Recurrent Neural Network (RNN). The dataset was collected by crawling the data via the Twitter API. After data underwent preprocessing, characteristics were extracted. Based on experimental results, classifiers employing RNN and BERT embedding and utilizing SMOTE produced the most accurate results (95.5%).
AB - Hate speech has negative effects on both the targeted victims and the listeners. The dissemination of hate speech can occur not only physically or verbally, but also in writing on social media. The emergence of hate speech on social media platforms can be difficult to identify in written communication. Currently, hate speech detection relies on machine learning. This study generates a vector representation of words using three pre-trained word insertion models: Global Vectors (GloVe), FastText, and Bidirectional Encoder Representations from Transformers (BERT). Synthetic Minority Oversampling Technique (SMOTE) and Random Over Sampling (ROS) were utilized as balancing methods to rectify data imbalance between classes. In addition, three distinct deep learning architectures were used to identify sentence-level hate speech in Indonesian tweets: Bidirectional Long Sort-Term Memory (BiLSTM), Convolution Neural Network (CNN), and Recurrent Neural Network (RNN). The dataset was collected by crawling the data via the Twitter API. After data underwent preprocessing, characteristics were extracted. Based on experimental results, classifiers employing RNN and BERT embedding and utilizing SMOTE produced the most accurate results (95.5%).
KW - Deep Learning
KW - Hate Speech
KW - Imbalance Data
KW - Indonesia Tweets
KW - Word Embedding
UR - http://www.scopus.com/inward/record.url?scp=85182402602&partnerID=8YFLogxK
U2 - 10.1145/3626641.3626925
DO - 10.1145/3626641.3626925
M3 - Conference contribution
AN - SCOPUS:85182402602
T3 - ACM International Conference Proceeding Series
SP - 58
EP - 63
BT - SIET 2023 - Proceedings of the 8th International Conference on Sustainable Information Engineering and Technology
PB - Association for Computing Machinery
T2 - 8th International Conference on Sustainable Information Engineering and Technology, SIET 2023
Y2 - 24 October 2023 through 25 October 2023
ER -