Abstract

Thesaurus as control vocabulary can be an important tool in Natural Language Processing (NLP). However, constructing a thesaurus manually by experts can be time consuming. Besides that the subjectivity of each expert can affect the structure of the thesaurus. A lot of method has already been implemented to build an automatic thesaurus in languages that categorized as rich language resources. In poor language resources such as Indonesia, the research about this field is still limited. This paper proposed a framework to construct a thesaurus in Indonesian language using monolingual corpus. The method will use Indonesian dictionary and large monolingual corpus from news articles. The candidate related terms will be extracted from every resource, then the two candidate will produce the final result of thesaurus. The evaluation is done by using the thesaurus as QE (Query Expansion) resource in IR (Information Retrieval) system. The experimental results show that using the automatic thesaurus can obtain the precision and recall of the system with 54.00% and 85.42%, respectively.

Original languageEnglish
Title of host publication2017 5th International Conference on Information and Communication Technology, ICoIC7 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509049127
DOIs
Publication statusPublished - 18 Oct 2017
Event5th International Conference on Information and Communication Technology, ICoIC7 2017 - Melaka, Malaysia
Duration: 17 May 201719 May 2017

Publication series

Name2017 5th International Conference on Information and Communication Technology, ICoIC7 2017

Conference

Conference5th International Conference on Information and Communication Technology, ICoIC7 2017
Country/TerritoryMalaysia
CityMelaka
Period17/05/1719/05/17

Keywords

  • Indonesian language
  • monolingual corpus
  • query expansion
  • thesaurus

Fingerprint

Dive into the research topics of 'Co-occurrence technique and dictionary based method for Indonesian thesaurus construction'. Together they form a unique fingerprint.

Cite this