TY - JOUR
T1 - An Improved Toxic Speech Detection on Multimodal Scam Confrontation Data Using LSTM-Based Deep Learning
AU - Gumelar, Agustinus Bimo
AU - Yuniarno, Eko Mulyanto
AU - Nugroho, Arif
AU - Adi, Derry Pramono
AU - Sugiarto, Indar
AU - Purnomo, Mauridhi Hery
N1 - Publisher Copyright:
© (2024), (Intelligent Network and Systems Society). All rights reserved.
PY - 2024
Y1 - 2024
N2 - Toxic speech has gained substantial attention, focusing on its detrimental effects and prevalence across online platforms. This phenomenon often exhibits discernible patterns in pronunciation analogous to emotions such as happiness or anger. It has been relatively underexplored in prior studies, which predominantly addressed offensive language, hate speech, and sarcasm without considering their emotional properties. Social media platforms have emerged as spaces where individuals share personal encounters with toxic speech that impacts on their well-being. To address this challenge, our study introduces a novel approach that combines speech and text data within a Long Short- Term Memory (LSTM) framework. Unlike existing methods that primarily focus on text analysis, our approach uniquely integrates both speech and text, thereby enhancing the model's ability to accurately detect toxic content. This multimodal data strategy is such an innovative step forward that it provides a more comprehensive solution to the problem of toxic speech detection. Our collected dataset comprises two-way conversations from online fraud reports and confrontations related to loan scams uploaded on YouTube, conducted in the Indonesian language. The absence of subtitles can emerge any ambiguity of homonyms, so it is required to transcribe the audio content to text. To do this, we used native speakers to make sure the transcription was correct in the Indonesian language of the toxic context. In addition, speech features, such as pitch, intensity, and speaking rate, were utilized alongside text features, including Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). As a result, validation through F1-score measurement yielded 92.73% for text data and 89.09% for speech data. Our proposed approach provided a substantial improvement of approximately 12%-30% compared to the previous LSTM models. The performance comparison results confirmed that our proposed approach can enhance the accuracy of toxic speech detection.
AB - Toxic speech has gained substantial attention, focusing on its detrimental effects and prevalence across online platforms. This phenomenon often exhibits discernible patterns in pronunciation analogous to emotions such as happiness or anger. It has been relatively underexplored in prior studies, which predominantly addressed offensive language, hate speech, and sarcasm without considering their emotional properties. Social media platforms have emerged as spaces where individuals share personal encounters with toxic speech that impacts on their well-being. To address this challenge, our study introduces a novel approach that combines speech and text data within a Long Short- Term Memory (LSTM) framework. Unlike existing methods that primarily focus on text analysis, our approach uniquely integrates both speech and text, thereby enhancing the model's ability to accurately detect toxic content. This multimodal data strategy is such an innovative step forward that it provides a more comprehensive solution to the problem of toxic speech detection. Our collected dataset comprises two-way conversations from online fraud reports and confrontations related to loan scams uploaded on YouTube, conducted in the Indonesian language. The absence of subtitles can emerge any ambiguity of homonyms, so it is required to transcribe the audio content to text. To do this, we used native speakers to make sure the transcription was correct in the Indonesian language of the toxic context. In addition, speech features, such as pitch, intensity, and speaking rate, were utilized alongside text features, including Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). As a result, validation through F1-score measurement yielded 92.73% for text data and 89.09% for speech data. Our proposed approach provided a substantial improvement of approximately 12%-30% compared to the previous LSTM models. The performance comparison results confirmed that our proposed approach can enhance the accuracy of toxic speech detection.
KW - Bag-of-words
KW - Long short-term memory
KW - Speech intensity
KW - Speech pitch
KW - Term frequency-inverse document frequency
KW - Toxic speech detection
UR - http://www.scopus.com/inward/record.url?scp=85208104949&partnerID=8YFLogxK
U2 - 10.22266/ijies2024.1231.67
DO - 10.22266/ijies2024.1231.67
M3 - Article
AN - SCOPUS:85208104949
SN - 2185-310X
VL - 17
SP - 880
EP - 904
JO - International Journal of Intelligent Engineering and Systems
JF - International Journal of Intelligent Engineering and Systems
IS - 6
ER -