Improving Sarcasm Detection in Mash-Up Language Through Hybrid Pretrained Word Embedding

Mochamad Alfan Rosid, Daniel Siahaan, Ahmad Saikhu

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Sarcasm detection is an imperative undertaking within the realm of natural language processing, albeit one that poses considerable challenges when confronted with mash-up languages, characterized by the amalgamation of multiple distinct languages. In response to the intricacies of sarcasm detection in mash-up languages, with a specific focus on the Indonesian-English language mash-up, this study introduces the Hybrid Pretrained Word Embedding approach as a means to enhance sarcasm detection. The primary objective of this research is to augment the precision of sarcasm detection in mash-up languages by amalgamating suitable word embeddings tailored to the employed terms. The present study combines two prevalent pretrained word embeddings, i.e Glove and Fasttext, wherein Glove is utilized to extract semantic context vectors for English words, while Fasttext is employed to extract semantic context vectors for Indonesian words. The classification process in this research leverages the deep learning methodology known as Bidirectional Gated Recurrent Unit (BiGRU). To assess the efficacy of the proposed approach, an extensive dataset comprising sarcastic and non-sarcastic tweets, written in a hybrid language of Indonesian and English, is acquired from the Twitter platform. The results unequivocally demonstrate that the Hybrid Pretrained Word Embedding approach significantly enhances sarcasm detection in mash-up languages, attaining a commendable classification accuracy of 93.57% and an F-measure of 97.94%. By offering an effective methodology to identify sarcasm in mash-up languages, this study yields a substantive contribution to the field of natural language processing.

Original languageEnglish
Title of host publication8th International Conference on Software Engineering and Computer Systems, ICSECS 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages58-63
Number of pages6
ISBN (Electronic)9798350310931
DOIs
Publication statusPublished - 2023
Event8th IEEE International Conference on Software Engineering and Computer Systems, ICSECS 2023 - Penang, Malaysia
Duration: 25 Aug 202327 Aug 2023

Publication series

Name8th International Conference on Software Engineering and Computer Systems, ICSECS 2023

Conference

Conference8th IEEE International Conference on Software Engineering and Computer Systems, ICSECS 2023
Country/TerritoryMalaysia
CityPenang
Period25/08/2327/08/23

Keywords

  • hybrid pretrained word embedding
  • mash-up languages
  • natural language processing
  • sarcasm detection

Fingerprint

Dive into the research topics of 'Improving Sarcasm Detection in Mash-Up Language Through Hybrid Pretrained Word Embedding'. Together they form a unique fingerprint.

Cite this