TY - GEN
T1 - A Combination of Term Frequency and Topic Modeling with Public Attention to Detect Hot Topics on Texts of Indonesian Online News
AU - Sierra, Evelyn
AU - Navastara, Dini Adni
AU - Purwitasari, Diana
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - In the age of digitalization, accurately identifying trending topics within online news articles is crucial for understanding public interests and concerns. This study introduces an approach that combines Term Frequency-Inverse Document Frequency (TF-IDF) and Topic Modeling techniques to pinpoint viral keywords in news content. While TF -IDF conventionally assigns word weights based on document frequency, it often struggles to capture emerging trends effectively. To overcome this limitation, our research presents a modified TF -IDF approach that incorporates temporal sensitivity, enabling us to detect trends in a more timely manner. Our methodology takes into account not only word frequency but also factors such as user attention and time in a specific period, on how the keyword appears in every month, thereby improving the accuracy of identifying the burst effect of the keyword. Our results demonstrate that our modified TF-IDF approach surpasses by 0.25 from TF -IDF in identifying viral content through KMeans clustering and exhibits topic modeling accuracy. This is especially evident when analyzing discussions related to events in 2022, as validated using Google Trend viral keywords. Despite our approach's improvements, it remains constrained by the need for periodic data updates and processing, preventing real-time trend detection. In conclusion, our research seeks to enhance the reliability of digital data for informed decision-making, aligning with sustainability goals. By providing a novel approach to identifying viral keywords in online news, we aim to contribute to a better understanding of public interests and concerns in the digital age.
AB - In the age of digitalization, accurately identifying trending topics within online news articles is crucial for understanding public interests and concerns. This study introduces an approach that combines Term Frequency-Inverse Document Frequency (TF-IDF) and Topic Modeling techniques to pinpoint viral keywords in news content. While TF -IDF conventionally assigns word weights based on document frequency, it often struggles to capture emerging trends effectively. To overcome this limitation, our research presents a modified TF -IDF approach that incorporates temporal sensitivity, enabling us to detect trends in a more timely manner. Our methodology takes into account not only word frequency but also factors such as user attention and time in a specific period, on how the keyword appears in every month, thereby improving the accuracy of identifying the burst effect of the keyword. Our results demonstrate that our modified TF-IDF approach surpasses by 0.25 from TF -IDF in identifying viral content through KMeans clustering and exhibits topic modeling accuracy. This is especially evident when analyzing discussions related to events in 2022, as validated using Google Trend viral keywords. Despite our approach's improvements, it remains constrained by the need for periodic data updates and processing, preventing real-time trend detection. In conclusion, our research seeks to enhance the reliability of digital data for informed decision-making, aligning with sustainability goals. By providing a novel approach to identifying viral keywords in online news, we aim to contribute to a better understanding of public interests and concerns in the digital age.
KW - Attention
KW - Hot Topics
KW - TF-IDF
KW - Time
KW - Topic Modeling
UR - http://www.scopus.com/inward/record.url?scp=85185553598&partnerID=8YFLogxK
U2 - 10.1109/ICITISEE58992.2023.10404337
DO - 10.1109/ICITISEE58992.2023.10404337
M3 - Conference contribution
AN - SCOPUS:85185553598
T3 - Proceedings - 2023 IEEE 7th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2023
SP - 363
EP - 368
BT - Proceedings - 2023 IEEE 7th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2023
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 7th International Conference on Information Technology, Information Systems and Electrical Engineering, ICITISEE 2023
Y2 - 29 November 2023 through 30 November 2023
ER -