TY - GEN
T1 - Topic modeling Twitter data using Latent Dirichlet Allocation and Latent Semantic Analysis
AU - Qomariyah, Siti
AU - Iriawan, Nur
AU - Fithriasari, Kartika
N1 - Publisher Copyright:
© 2019 Author(s).
PY - 2019/12/18
Y1 - 2019/12/18
N2 - The industrial world has entered the era of industrial revolution 4.0. In this era, there is an urgent data requirement from the community to support service policies. Because of that, Surabaya Government made Media Center Surabaya. This media is used to accommodate all the aspiration of Surabaya citizen. To access this media, a citizen can use Twitter. The topic which is discussed in Twitter is important information that we need to know. The information can be used to improve the performance of Surabaya Government services. Twitter data is a text data that consists of thousands of variables. Text mining is frequently used to analyze this kind of data, including topic modeling and sentiment analysis. This study would work on topic modeling focused on the algorithm employing Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). The evaluation of the algorithm performance uses the topic coherence. As unstructured data, the Twitter data need preprocessing before the analysis. The stages of preprocessing include cleansing, stemming, and stop words. The advantages of LSA are fast and easy to implement. LSA, on the other hand, doesn't consider the relationship between documents in the corpus, while LDA does. This study shows that LDA gives a better result than LSA.
AB - The industrial world has entered the era of industrial revolution 4.0. In this era, there is an urgent data requirement from the community to support service policies. Because of that, Surabaya Government made Media Center Surabaya. This media is used to accommodate all the aspiration of Surabaya citizen. To access this media, a citizen can use Twitter. The topic which is discussed in Twitter is important information that we need to know. The information can be used to improve the performance of Surabaya Government services. Twitter data is a text data that consists of thousands of variables. Text mining is frequently used to analyze this kind of data, including topic modeling and sentiment analysis. This study would work on topic modeling focused on the algorithm employing Latent Dirichlet Allocation (LDA) and Latent Semantic Analysis (LSA). The evaluation of the algorithm performance uses the topic coherence. As unstructured data, the Twitter data need preprocessing before the analysis. The stages of preprocessing include cleansing, stemming, and stop words. The advantages of LSA are fast and easy to implement. LSA, on the other hand, doesn't consider the relationship between documents in the corpus, while LDA does. This study shows that LDA gives a better result than LSA.
UR - http://www.scopus.com/inward/record.url?scp=85077722556&partnerID=8YFLogxK
U2 - 10.1063/1.5139825
DO - 10.1063/1.5139825
M3 - Conference contribution
AN - SCOPUS:85077722556
T3 - AIP Conference Proceedings
BT - 2nd International Conference on Science, Mathematics, Environment, and Education
A2 - Indriyanti, Nurma Yunita
A2 - Ramli, Murni
A2 - Nurhasanah, Farida
PB - American Institute of Physics Inc.
T2 - 2nd International Conference on Science, Mathematics, Environment, and Education, ICoSMEE 2019
Y2 - 26 July 2019 through 28 July 2019
ER -