TY - JOUR
T1 - Comparison of stemming algorithms on Indonesian text processing
AU - Rizki, Afian Syafaadi
AU - Tjahyanto, Aris
AU - Trialih, Rahmat
N1 - Publisher Copyright:
© 2019 Universitas Ahmad Dahlan.
PY - 2019/2/1
Y1 - 2019/2/1
N2 - Stemming is one of the stages performed on the process of extracting information from the text. Stemming is a process of converting words into their roots. There is an indication that the most accurate stemmer algorithm is not the only way to achieve the best performance in information retrieval (IR). In this study, seven Indonesian stemmer algorithms and an English stemmer algorithm are compared, they are Nazief, Arifin, Fadillah, Asian, Enhanched confix stripping (ECS), Arifiyanti and Porter. The data used are 2,734 tweets collected from the official twitter account of PLN. First, the aims are to analyze the correlation between stemmer accuracy and information retrieval performance in Indonesian text language. Second, is to identify the best algorithm for Indonesian text processing purpose. This research also proposed improved algorithm for stemming Indonesian text. The result shows that correlation found in the previous research does not occur for the Indonesian language. The result also shows that the proposed algorithm was the best for Indonesian text processing purpose with weighted scoring value of 0.648.
AB - Stemming is one of the stages performed on the process of extracting information from the text. Stemming is a process of converting words into their roots. There is an indication that the most accurate stemmer algorithm is not the only way to achieve the best performance in information retrieval (IR). In this study, seven Indonesian stemmer algorithms and an English stemmer algorithm are compared, they are Nazief, Arifin, Fadillah, Asian, Enhanched confix stripping (ECS), Arifiyanti and Porter. The data used are 2,734 tweets collected from the official twitter account of PLN. First, the aims are to analyze the correlation between stemmer accuracy and information retrieval performance in Indonesian text language. Second, is to identify the best algorithm for Indonesian text processing purpose. This research also proposed improved algorithm for stemming Indonesian text. The result shows that correlation found in the previous research does not occur for the Indonesian language. The result also shows that the proposed algorithm was the best for Indonesian text processing purpose with weighted scoring value of 0.648.
KW - Confix stripping stemmer
KW - Indonesian stemmer
KW - Information retrieval
KW - Text clustering
UR - http://www.scopus.com/inward/record.url?scp=85062300325&partnerID=8YFLogxK
U2 - 10.12928/TELKOMNIKA.v17i1.10183
DO - 10.12928/TELKOMNIKA.v17i1.10183
M3 - Article
AN - SCOPUS:85062300325
SN - 1693-6930
VL - 17
SP - 95
EP - 102
JO - Telkomnika (Telecommunication Computing Electronics and Control)
JF - Telkomnika (Telecommunication Computing Electronics and Control)
IS - 1
ER -