Large scale text classification using map reduce and naive bayes algorithm for domain specified ontology building

Joan Santoso, Eko Mulyanto Yuniarno, Mochamad Hariadi

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

10 Citations (Scopus)

Abstract

Internet that covers a large information gives an opportunity to obtain knowledge from it. Internet contains large unstructured and unorganized data such as text, video, and image. Problems arise on how to organize large amount of data and obtain a useful information from it. This information can be used as knowledge in the intelligent computer system. Ontology as one of knowledge representation covers a large area topic. For constructing domain specified ontology, we use large text dataset on Internet and organize it into specified domain before ontology building process is done. We try to implement naive bayes text classifier using map reduce programming model in our research for organizing our large text dataset. In this experiment, we use animal and plant domain article in Wikipedia online encyclopedia as our dataset. Our proposed method can achieve highest accuracy with score about 98.8%. This experiment shows that our proposed method provides a robust system and good accuracy for classifying document into specified domain in preprocessing phase for domain specified ontology building.

Original languageEnglish
Title of host publicationProceedings - 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2015
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages428-432
Number of pages5
ISBN (Electronic)9781479986460
DOIs
Publication statusPublished - 20 Nov 2015
Event7th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2015 - Hangzhou, Zhejiang, China
Duration: 26 Aug 201527 Aug 2015

Publication series

NameProceedings - 2015 7th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2015
Volume1

Conference

Conference7th International Conference on Intelligent Human-Machine Systems and Cybernetics, IHMSC 2015
Country/TerritoryChina
CityHangzhou, Zhejiang
Period26/08/1527/08/15

Keywords

  • Big Data
  • Domain Specified Text Classification
  • Map Reduce
  • Naive Bayes
  • Text Classification

Fingerprint

Dive into the research topics of 'Large scale text classification using map reduce and naive bayes algorithm for domain specified ontology building'. Together they form a unique fingerprint.

Cite this