Topic Identification and Categorization of Public Information in Community-Based Social Media

R. P. Kusumawardani, M. H. Basri

Research output: Contribution to journalConference articlepeer-review

4 Citations (Scopus)

Abstract

This paper presents a work on a semi-supervised method for topic identification and classification of short texts in the social media, and its application on tweets containing dialogues in a large community of dwellers in a city, written mostly in Indonesian. These dialogues comprise a wealth of information about the city, shared in real-time. We found that despite the high irregularity of the language used, and the scarcity of suitable linguistic resources, a meaningful identification of topics could be performed by clustering the tweets using the K-Means algorithm. The resulting clusters are found to be robust enough to be the basis of a classification. On three grouping schemes derived from the clusters, we get accuracy of 95.52%, 95.51%, and 96.7 using linear SVMs, reflecting the applicability of applying this method for generating topic identification and classification on such data.

Original languageEnglish
Article number012075
JournalJournal of Physics: Conference Series
Volume801
Issue number1
DOIs
Publication statusPublished - 27 Mar 2017
Event1st International Conference on Computing and Applied Informatics, ICCAI 2016 - Medan, Indonesia
Duration: 14 Dec 201615 Dec 2016

Fingerprint

Dive into the research topics of 'Topic Identification and Categorization of Public Information in Community-Based Social Media'. Together they form a unique fingerprint.

Cite this