TY - JOUR
T1 - Leveraging social media data using latent dirichlet allocation and naïve bayes for mental health sentiment analytics on Covid-19 pandemic
AU - Khalid, Nurzulaikha
AU - Abdul-Rahman, Shuzlina
AU - Wibowo, Wahyu
AU - Abdullah, Nur Atiqah Sia
AU - Mutalib, Sofianita
N1 - Publisher Copyright:
© 2023, Universitas Ahmad Dahlan. All rights reserved.
PY - 2023/11
Y1 - 2023/11
N2 - In Malaysia, during the early stages of the COVID-19 pandemic, the negative impact on mental health became noticeable. The public's psychological and behavioral responses have risen as the COVID-19 outbreak progresses. A high impression of severity, vulnerability, impact, and fear was the element that influenced higher anxiety. Social media data can be used to track Malaysian sentiments in the COVID-19 era. However, it is often found on the internet in text format with no labels, and manually decoding this data is usually complicated. Furthermore, traditional data-gathering approaches, such as filling out a survey form, may not completely capture the sentiments. This study uses a text mining technique called Latent Dirichlet Allocation (LDA) on social media to discover mental health topics during the COVID-19 pandemic. Then, a model is developed using a hybrid approach, combining both lexicon-based and Naïve Bayes classifier. The accuracy, precision, recall, and F-measures are used to evaluate the sentiment classification. The result shows that the best lexicon-based technique is VADER with 72% accuracy compared to TextBlob with 70% accuracy. These sentiments results allow for a better understanding and handling of the pandemic. The top three topics are identified and further classified into positive and negative comments. In conclusion, the developed model can assist healthcare workers and policymakers in making the right decisions in the upcoming pandemic outbreaks.
AB - In Malaysia, during the early stages of the COVID-19 pandemic, the negative impact on mental health became noticeable. The public's psychological and behavioral responses have risen as the COVID-19 outbreak progresses. A high impression of severity, vulnerability, impact, and fear was the element that influenced higher anxiety. Social media data can be used to track Malaysian sentiments in the COVID-19 era. However, it is often found on the internet in text format with no labels, and manually decoding this data is usually complicated. Furthermore, traditional data-gathering approaches, such as filling out a survey form, may not completely capture the sentiments. This study uses a text mining technique called Latent Dirichlet Allocation (LDA) on social media to discover mental health topics during the COVID-19 pandemic. Then, a model is developed using a hybrid approach, combining both lexicon-based and Naïve Bayes classifier. The accuracy, precision, recall, and F-measures are used to evaluate the sentiment classification. The result shows that the best lexicon-based technique is VADER with 72% accuracy compared to TextBlob with 70% accuracy. These sentiments results allow for a better understanding and handling of the pandemic. The top three topics are identified and further classified into positive and negative comments. In conclusion, the developed model can assist healthcare workers and policymakers in making the right decisions in the upcoming pandemic outbreaks.
KW - COVID-19
KW - Latent Dirichlet Allocation (LDA)
KW - Lexicon-Based
KW - Mental Health
KW - Naïve Bayes
KW - Social Media
UR - http://www.scopus.com/inward/record.url?scp=85178263223&partnerID=8YFLogxK
U2 - 10.26555/ijain.v9i3.1367
DO - 10.26555/ijain.v9i3.1367
M3 - Article
AN - SCOPUS:85178263223
SN - 2442-6571
VL - 9
SP - 457
EP - 471
JO - International Journal of Advances in Intelligent Informatics
JF - International Journal of Advances in Intelligent Informatics
IS - 3
ER -