TY - JOUR
T1 - Developing corpora using word2vec and wikipedia for word sense disambiguation
AU - Nurifan, Farza
AU - Sarno, Riyanarto
AU - Wahyuni, Cahyaningtyas Sekar
N1 - Publisher Copyright:
© 2018 Institute of Advanced Engineering and Science. All rights reserved.
PY - 2018/12
Y1 - 2018/12
N2 - Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approach such as sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. This research is done to solve WSD problem with two small corpora. The use of Word2vec and Wikipedia are proposed to develop the corpora. After developing the corpora, the similarity of the sentence with the corpora is measured using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, Lesk algorithms and Wu Palmer similarity are used to deal with problems when there is no word from a sentence in the corpus. The results of the research show an 85.51% accuracy rate and the semantic similarity improve the accuracy rate by 8.02% in determining the meaning of ambiguous words.
AB - Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approach such as sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. This research is done to solve WSD problem with two small corpora. The use of Word2vec and Wikipedia are proposed to develop the corpora. After developing the corpora, the similarity of the sentence with the corpora is measured using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, Lesk algorithms and Wu Palmer similarity are used to deal with problems when there is no word from a sentence in the corpus. The results of the research show an 85.51% accuracy rate and the semantic similarity improve the accuracy rate by 8.02% in determining the meaning of ambiguous words.
KW - Lesk
KW - Wikipedia
KW - Word sense disambiguation
KW - Word2vec
KW - Wu palmer
UR - http://www.scopus.com/inward/record.url?scp=85057251395&partnerID=8YFLogxK
U2 - 10.11591/ijeecs.v12.i3.pp1239-1246
DO - 10.11591/ijeecs.v12.i3.pp1239-1246
M3 - Article
AN - SCOPUS:85057251395
SN - 2502-4752
VL - 12
SP - 1239
EP - 1246
JO - Indonesian Journal of Electrical Engineering and Computer Science
JF - Indonesian Journal of Electrical Engineering and Computer Science
IS - 3
ER -