TY - GEN
T1 - Mobile app review labeling using lda similarity and term frequency-inverse cluster frequency (TF-ICF)
AU - Puspaningrum, Alifia
AU - Siahaan, Daniel
AU - Fatichah, Chastine
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/11/13
Y1 - 2018/11/13
N2 - User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.
AB - User review mining has attracted many researchers to analyze and develop innovative models. The models provide technical recommendation for software developers to make decisions during software maintenance a software evolution. One of the recommendations is user review categorization. There are many categorizations have been popularly used, namely bug errors, feature requests, and noninformative. There are many methods that have been done to classify user reviews. One of the classification methods is Latent Dirichlet Allocation (LDA). LDA is a topic modelling method which ables to map hidden topics resided in a document. Thus, one of techniques to map hidden topics into categories is calculating term similarity value between hidden topic and the pre-defined signifier term list. However, the limited signifier term list of each category becomes a problem. Meanwhile Term Frequency-Inverse Corpus Frequency (TF-ICF) is able to take important terms on a cluster. Therefore, this paper introduces a method that combines TF-ICF with LDA clustering based on similarity (LDAS TF-ICF) to overcome it. The classification results were calculated by using precision, recall, and F1-score. The results show the method can outperform LDA. The best performance of LDAS TF-ICF occured when 75% expanded term list was used, given the precision, recall, dan f-measure values 0.564, 0.507, and 0.491, respectively.
KW - LDA
KW - Review Semantic Similarity
KW - Software Evolution
KW - Software Maintenance
KW - TF-ICF
UR - http://www.scopus.com/inward/record.url?scp=85058417275&partnerID=8YFLogxK
U2 - 10.1109/ICITEED.2018.8534785
DO - 10.1109/ICITEED.2018.8534785
M3 - Conference contribution
AN - SCOPUS:85058417275
T3 - Proceedings of 2018 10th International Conference on Information Technology and Electrical Engineering: Smart Technology for Better Society, ICITEE 2018
SP - 365
EP - 370
BT - Proceedings of 2018 10th International Conference on Information Technology and Electrical Engineering
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 10th International Conference on Information Technology and Electrical Engineering, ICITEE 2018
Y2 - 24 July 2018 through 26 July 2018
ER -