Using semantic similarity for identifying relevant page numbers for indexed term of textual book

Daniel Siahaan, Sherly Christina

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Back-of-book index page is one of navigation tools for reader. It helps reader to immediately jump to a page that contains relevant information regarding a specific term. It helps reader to retrieve information about specific topics in mind without having to read the complete book. Indexed terms are usually determined by author based on one’s subjective preference on what indications should be used to decide whether a term should be indexed and what pages are relevant. Therefore, indexing a book inherits subjectivity of author side. The book size is proportional to the indexing effort and consistency. This leads to the fact that page numbers are not always referred to relevant pages. This paper proposes an approach to identify relevancy of a page that contains an indexed term. This approach measures the semantic relation between indexed term with the respective sentence in the page. To measure the semantic relation, the approach utilizes semantic distance algorithm that based on Wordnet thesaurus. We measure the reliability of our system by measuring its degree of agreement with the book indexer using kappa statistics. The experimental result shows that the proposed approach are considered as good as the domain expert, given average kappa value 0.6034.

Original languageEnglish
Title of host publicationIntelligence in the Era of Big Data - 4th International Conference on Soft Computing, Intelligent Systems and Information Technology, ICSIIT 2015, Proceedings
EditorsChi-Hung Chi, Rolly Intan, Henry N. Palit, Leo Willyanto Santoso
PublisherSpringer Verlag
Pages183-192
Number of pages10
ISBN (Electronic)9783662467411
DOIs
Publication statusPublished - 2015
Event4th International Conference on Soft Computing, Intelligent Systems and Information Technology, ICSIIT 2015 - Bali, Indonesia
Duration: 11 Mar 201514 Mar 2015

Publication series

NameCommunications in Computer and Information Science
Volume516
ISSN (Print)1865-0929

Conference

Conference4th International Conference on Soft Computing, Intelligent Systems and Information Technology, ICSIIT 2015
Country/TerritoryIndonesia
CityBali
Period11/03/1514/03/15

Keywords

  • Back-of-book index
  • Book indexing
  • Relevant page number
  • Semantic relation

Fingerprint

Dive into the research topics of 'Using semantic similarity for identifying relevant page numbers for indexed term of textual book'. Together they form a unique fingerprint.

Cite this