TY - JOUR
T1 - Combination of BERT and Hybrid CNN-LSTM Models for Indonesia Dengue Tweets Classification
AU - Anggraeni, Wiwik
AU - Kusuma, Moch Farrel Arrizal
AU - Riksakomara, Edwin
AU - Wibowo, Radityo P.
AU - Pujiadi,
AU - Sumpeno, Surya
N1 - Publisher Copyright:
© (2024), (Intelligent Network and Systems Society). All Rights Reserved.
PY - 2024
Y1 - 2024
N2 - In the era of social media and online communication, the surveillance and classification of disease-related information in real-time is crucial. Twitter data on dengue-related are rarely used for classification, especially text-based classification in Indonesian. Even though, the classification of dengue-related news tweets is capable of being utilized for a variety of purposes. This study presents a novel approach to classifying dengue fever-related tweets in the Indonesian context, utilizing the potential of advanced language models and hybrid neural networks. The method proposed incorporates the advantages of two deep learning architectures: Bidirectional encoder representations from transformers indonesian-based (Indo-BERT) and a hybrid convolutional neural network-long short-term memory (CNN-LSTM) model which is still rarely used in this context. Indo-BERT, a pre-trained language model, extracts complex semantic and contextual information from text, enabling a deeper comprehension of dengue-related tweets. The Hybrid CNN-LSTM model processes textual data in tandem, extracting features via convolutional layers while maintaining temporal dependencies. To assure this model's effectiveness in the Indonesian context, a collection of dengue fever-related tweets in Indonesian was created and labeled for supervised learning. Experiments were conducted in multiple scenarios, and the results demonstrated that the combination of the Indo-BERT and Hybrid CNN-LSTM models had superior performance in classifying tweets about dengue illness into five labels, namely infected, awareness, informative, news, and others. The best models deliver an accuracy of 0.91, an F1-score of 0.90, a precision of 0.91, and a recall of 0.89. The hybrid model's efficacy surpasses that of previous approaches with an average difference in accuracy of 0.25, precision of 0.27, and recall of 0.26. It is anticipated that the Health Service can use the results of this classification to improve the dengue surveillance system with the initial data obtained from this classification.
AB - In the era of social media and online communication, the surveillance and classification of disease-related information in real-time is crucial. Twitter data on dengue-related are rarely used for classification, especially text-based classification in Indonesian. Even though, the classification of dengue-related news tweets is capable of being utilized for a variety of purposes. This study presents a novel approach to classifying dengue fever-related tweets in the Indonesian context, utilizing the potential of advanced language models and hybrid neural networks. The method proposed incorporates the advantages of two deep learning architectures: Bidirectional encoder representations from transformers indonesian-based (Indo-BERT) and a hybrid convolutional neural network-long short-term memory (CNN-LSTM) model which is still rarely used in this context. Indo-BERT, a pre-trained language model, extracts complex semantic and contextual information from text, enabling a deeper comprehension of dengue-related tweets. The Hybrid CNN-LSTM model processes textual data in tandem, extracting features via convolutional layers while maintaining temporal dependencies. To assure this model's effectiveness in the Indonesian context, a collection of dengue fever-related tweets in Indonesian was created and labeled for supervised learning. Experiments were conducted in multiple scenarios, and the results demonstrated that the combination of the Indo-BERT and Hybrid CNN-LSTM models had superior performance in classifying tweets about dengue illness into five labels, namely infected, awareness, informative, news, and others. The best models deliver an accuracy of 0.91, an F1-score of 0.90, a precision of 0.91, and a recall of 0.89. The hybrid model's efficacy surpasses that of previous approaches with an average difference in accuracy of 0.25, precision of 0.27, and recall of 0.26. It is anticipated that the Health Service can use the results of this classification to improve the dengue surveillance system with the initial data obtained from this classification.
KW - BERT
KW - Classification
KW - Convolutional neural network
KW - Long short-term memory
KW - Social media
KW - Twitter
UR - http://www.scopus.com/inward/record.url?scp=85184160182&partnerID=8YFLogxK
U2 - 10.22266/ijies2024.0229.68
DO - 10.22266/ijies2024.0229.68
M3 - Article
AN - SCOPUS:85184160182
SN - 2185-310X
VL - 17
SP - 813
EP - 826
JO - International Journal of Intelligent Engineering and Systems
JF - International Journal of Intelligent Engineering and Systems
IS - 1
ER -