An Improved Toxic Speech Detection on Multimodal Scam Confrontation Data Using LSTM-Based Deep Learning

Agustinus Bimo Gumelar, Eko Mulyanto Yuniarno, Arif Nugroho, Derry Pramono Adi, Indar Sugiarto, Mauridhi Hery Purnomo*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

Abstract

Toxic speech has gained substantial attention, focusing on its detrimental effects and prevalence across online platforms. This phenomenon often exhibits discernible patterns in pronunciation analogous to emotions such as happiness or anger. It has been relatively underexplored in prior studies, which predominantly addressed offensive language, hate speech, and sarcasm without considering their emotional properties. Social media platforms have emerged as spaces where individuals share personal encounters with toxic speech that impacts on their well-being. To address this challenge, our study introduces a novel approach that combines speech and text data within a Long Short- Term Memory (LSTM) framework. Unlike existing methods that primarily focus on text analysis, our approach uniquely integrates both speech and text, thereby enhancing the model's ability to accurately detect toxic content. This multimodal data strategy is such an innovative step forward that it provides a more comprehensive solution to the problem of toxic speech detection. Our collected dataset comprises two-way conversations from online fraud reports and confrontations related to loan scams uploaded on YouTube, conducted in the Indonesian language. The absence of subtitles can emerge any ambiguity of homonyms, so it is required to transcribe the audio content to text. To do this, we used native speakers to make sure the transcription was correct in the Indonesian language of the toxic context. In addition, speech features, such as pitch, intensity, and speaking rate, were utilized alongside text features, including Bag-of-Words (BoW) and Term Frequency-Inverse Document Frequency (TF-IDF). As a result, validation through F1-score measurement yielded 92.73% for text data and 89.09% for speech data. Our proposed approach provided a substantial improvement of approximately 12%-30% compared to the previous LSTM models. The performance comparison results confirmed that our proposed approach can enhance the accuracy of toxic speech detection.

Original languageEnglish
Pages (from-to)880-904
Number of pages25
JournalInternational Journal of Intelligent Engineering and Systems
Volume17
Issue number6
DOIs
Publication statusPublished - 2024

Keywords

  • Bag-of-words
  • Long short-term memory
  • Speech intensity
  • Speech pitch
  • Term frequency-inverse document frequency
  • Toxic speech detection

Fingerprint

Dive into the research topics of 'An Improved Toxic Speech Detection on Multimodal Scam Confrontation Data Using LSTM-Based Deep Learning'. Together they form a unique fingerprint.

Cite this