TY - GEN
T1 - Improving Sarcasm Detection in Mash-Up Language Through Hybrid Pretrained Word Embedding
AU - Rosid, Mochamad Alfan
AU - Siahaan, Daniel
AU - Saikhu, Ahmad
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Sarcasm detection is an imperative undertaking within the realm of natural language processing, albeit one that poses considerable challenges when confronted with mash-up languages, characterized by the amalgamation of multiple distinct languages. In response to the intricacies of sarcasm detection in mash-up languages, with a specific focus on the Indonesian-English language mash-up, this study introduces the Hybrid Pretrained Word Embedding approach as a means to enhance sarcasm detection. The primary objective of this research is to augment the precision of sarcasm detection in mash-up languages by amalgamating suitable word embeddings tailored to the employed terms. The present study combines two prevalent pretrained word embeddings, i.e Glove and Fasttext, wherein Glove is utilized to extract semantic context vectors for English words, while Fasttext is employed to extract semantic context vectors for Indonesian words. The classification process in this research leverages the deep learning methodology known as Bidirectional Gated Recurrent Unit (BiGRU). To assess the efficacy of the proposed approach, an extensive dataset comprising sarcastic and non-sarcastic tweets, written in a hybrid language of Indonesian and English, is acquired from the Twitter platform. The results unequivocally demonstrate that the Hybrid Pretrained Word Embedding approach significantly enhances sarcasm detection in mash-up languages, attaining a commendable classification accuracy of 93.57% and an F-measure of 97.94%. By offering an effective methodology to identify sarcasm in mash-up languages, this study yields a substantive contribution to the field of natural language processing.
AB - Sarcasm detection is an imperative undertaking within the realm of natural language processing, albeit one that poses considerable challenges when confronted with mash-up languages, characterized by the amalgamation of multiple distinct languages. In response to the intricacies of sarcasm detection in mash-up languages, with a specific focus on the Indonesian-English language mash-up, this study introduces the Hybrid Pretrained Word Embedding approach as a means to enhance sarcasm detection. The primary objective of this research is to augment the precision of sarcasm detection in mash-up languages by amalgamating suitable word embeddings tailored to the employed terms. The present study combines two prevalent pretrained word embeddings, i.e Glove and Fasttext, wherein Glove is utilized to extract semantic context vectors for English words, while Fasttext is employed to extract semantic context vectors for Indonesian words. The classification process in this research leverages the deep learning methodology known as Bidirectional Gated Recurrent Unit (BiGRU). To assess the efficacy of the proposed approach, an extensive dataset comprising sarcastic and non-sarcastic tweets, written in a hybrid language of Indonesian and English, is acquired from the Twitter platform. The results unequivocally demonstrate that the Hybrid Pretrained Word Embedding approach significantly enhances sarcasm detection in mash-up languages, attaining a commendable classification accuracy of 93.57% and an F-measure of 97.94%. By offering an effective methodology to identify sarcasm in mash-up languages, this study yields a substantive contribution to the field of natural language processing.
KW - hybrid pretrained word embedding
KW - mash-up languages
KW - natural language processing
KW - sarcasm detection
UR - http://www.scopus.com/inward/record.url?scp=85175462255&partnerID=8YFLogxK
U2 - 10.1109/ICSECS58457.2023.10256422
DO - 10.1109/ICSECS58457.2023.10256422
M3 - Conference contribution
AN - SCOPUS:85175462255
T3 - 8th International Conference on Software Engineering and Computer Systems, ICSECS 2023
SP - 58
EP - 63
BT - 8th International Conference on Software Engineering and Computer Systems, ICSECS 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 8th IEEE International Conference on Software Engineering and Computer Systems, ICSECS 2023
Y2 - 25 August 2023 through 27 August 2023
ER -