Abstract

In the era of social media and online communication, the surveillance and classification of disease-related information in real-time is crucial. Twitter data on dengue-related are rarely used for classification, especially text-based classification in Indonesian. Even though, the classification of dengue-related news tweets is capable of being utilized for a variety of purposes. This study presents a novel approach to classifying dengue fever-related tweets in the Indonesian context, utilizing the potential of advanced language models and hybrid neural networks. The method proposed incorporates the advantages of two deep learning architectures: Bidirectional encoder representations from transformers indonesian-based (Indo-BERT) and a hybrid convolutional neural network-long short-term memory (CNN-LSTM) model which is still rarely used in this context. Indo-BERT, a pre-trained language model, extracts complex semantic and contextual information from text, enabling a deeper comprehension of dengue-related tweets. The Hybrid CNN-LSTM model processes textual data in tandem, extracting features via convolutional layers while maintaining temporal dependencies. To assure this model's effectiveness in the Indonesian context, a collection of dengue fever-related tweets in Indonesian was created and labeled for supervised learning. Experiments were conducted in multiple scenarios, and the results demonstrated that the combination of the Indo-BERT and Hybrid CNN-LSTM models had superior performance in classifying tweets about dengue illness into five labels, namely infected, awareness, informative, news, and others. The best models deliver an accuracy of 0.91, an F1-score of 0.90, a precision of 0.91, and a recall of 0.89. The hybrid model's efficacy surpasses that of previous approaches with an average difference in accuracy of 0.25, precision of 0.27, and recall of 0.26. It is anticipated that the Health Service can use the results of this classification to improve the dengue surveillance system with the initial data obtained from this classification.

Original languageEnglish
Pages (from-to)813-826
Number of pages14
JournalInternational Journal of Intelligent Engineering and Systems
Volume17
Issue number1
DOIs
Publication statusPublished - 2024

Keywords

  • BERT
  • Classification
  • Convolutional neural network
  • Long short-term memory
  • Social media
  • Twitter

Fingerprint

Dive into the research topics of 'Combination of BERT and Hybrid CNN-LSTM Models for Indonesia Dengue Tweets Classification'. Together they form a unique fingerprint.

Cite this