Developing corpora using word2vec and wikipedia for word sense disambiguation

Farza Nurifan, Riyanarto Sarno*, Cahyaningtyas Sekar Wahyuni

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

10 Citations (Scopus)

Abstract

Word Sense Disambiguation (WSD) is one of the most difficult problems in the artificial intelligence field or well known as AI-hard or AI-complete. A lot of problems can be solved using word sense disambiguation approach such as sentiment analysis, machine translation, search engine relevance, coherence, anaphora resolution, and inference. This research is done to solve WSD problem with two small corpora. The use of Word2vec and Wikipedia are proposed to develop the corpora. After developing the corpora, the similarity of the sentence with the corpora is measured using cosine similarity to determine the meaning of the ambiguous word. Lastly, to improve accuracy, Lesk algorithms and Wu Palmer similarity are used to deal with problems when there is no word from a sentence in the corpus. The results of the research show an 85.51% accuracy rate and the semantic similarity improve the accuracy rate by 8.02% in determining the meaning of ambiguous words.

Original languageEnglish
Pages (from-to)1239-1246
Number of pages8
JournalIndonesian Journal of Electrical Engineering and Computer Science
Volume12
Issue number3
DOIs
Publication statusPublished - Dec 2018

Keywords

  • Lesk
  • Wikipedia
  • Word sense disambiguation
  • Word2vec
  • Wu palmer

Fingerprint

Dive into the research topics of 'Developing corpora using word2vec and wikipedia for word sense disambiguation'. Together they form a unique fingerprint.

Cite this