TY - GEN
T1 - Text Clustering of Tweets Categories on PT. Transportasi Jakarta Official Account
AU - Rachmat, Gabriella Varitie Sentosa
AU - Irhamah,
AU - Fithriasari, Kartika
N1 - Publisher Copyright:
© 2023 American Institute of Physics Inc.. All rights reserved.
PY - 2023/5/19
Y1 - 2023/5/19
N2 - Because of the huge number of private cars going through the streets of Jakarta, traffic congestion develops regularly, prompting the Provincial Government to establish TransJakarta. Often the TransJakarta users wish to ask questions, file complaints or add suggestions to TransJakarta via Twitter. To make it easier and faster for TransJakarta to respond to tweets, it is vital for them to understand the categories of tweets. In order to do this, Tweet categories were determined using data collected from the Twitter API. The text preprocessing was done first then proceeded with calculating and weighting each word using Term Frequency-Inverse Document Frequency (TF-IDF). In addition, Genetic Algorithm (GA) was proposed to be used in feature selection. K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) methods are compared based on the silhouette coefficient value to determine the categories of tweets and then visualized using word clouds. The clustering results show that the best method is DBSCAN with GA-based feature selection because it produces a high silhouette coefficient value with less noise than without GA-based feature selection. Clustering obtained four categories of tweets, namely bus stop/route, bus facilities, bus cleanliness, and TransJakarta's consistency.
AB - Because of the huge number of private cars going through the streets of Jakarta, traffic congestion develops regularly, prompting the Provincial Government to establish TransJakarta. Often the TransJakarta users wish to ask questions, file complaints or add suggestions to TransJakarta via Twitter. To make it easier and faster for TransJakarta to respond to tweets, it is vital for them to understand the categories of tweets. In order to do this, Tweet categories were determined using data collected from the Twitter API. The text preprocessing was done first then proceeded with calculating and weighting each word using Term Frequency-Inverse Document Frequency (TF-IDF). In addition, Genetic Algorithm (GA) was proposed to be used in feature selection. K-means and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) methods are compared based on the silhouette coefficient value to determine the categories of tweets and then visualized using word clouds. The clustering results show that the best method is DBSCAN with GA-based feature selection because it produces a high silhouette coefficient value with less noise than without GA-based feature selection. Clustering obtained four categories of tweets, namely bus stop/route, bus facilities, bus cleanliness, and TransJakarta's consistency.
UR - http://www.scopus.com/inward/record.url?scp=85161443614&partnerID=8YFLogxK
U2 - 10.1063/5.0136569
DO - 10.1063/5.0136569
M3 - Conference contribution
AN - SCOPUS:85161443614
T3 - AIP Conference Proceedings
BT - Proceedings of the International Conference on Advanced Technology and Multidiscipline, ICATAM 2021
A2 - Widiyanti, Prihartini
A2 - Jiwanti, Prastika Krisma
A2 - Prihandana, Gunawan Setia
A2 - Ningrum, Ratih Ardiati
A2 - Prastio, Rizki Putra
A2 - Setiadi, Herlambang
A2 - Rizki, Intan Nurul
PB - American Institute of Physics Inc.
T2 - 1st International Conference on Advanced Technology and Multidiscipline: Advanced Technology and Multidisciplinary Prospective Towards Bright Future, ICATAM 2021
Y2 - 13 October 2021 through 14 October 2021
ER -