TY - GEN
T1 - Website Main Content Extraction Using Template-Based Approach and Naïve-Bayes Classification
AU - Rakhmawati, Nur Aini
AU - Kurniawan, Fajara
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Every web page will have the main content. The main content is a section, segment or block that contains text or multimedia on a single web page. Important information about local governance generally lies within the main content, thus the need for web content extractor to extract that information. To solve these problems, this research combines two approaches that already existed, template-based approach and machine learning approach using Naïve-Bayes Classifier. Generally, previous research that has been conducted is using one type of approach; it is either using a template-based approach or using a machine learning approach. The result shows that with combining two types of approaches, the model could identify 95% all nodes that contain the main content.
AB - Every web page will have the main content. The main content is a section, segment or block that contains text or multimedia on a single web page. Important information about local governance generally lies within the main content, thus the need for web content extractor to extract that information. To solve these problems, this research combines two approaches that already existed, template-based approach and machine learning approach using Naïve-Bayes Classifier. Generally, previous research that has been conducted is using one type of approach; it is either using a template-based approach or using a machine learning approach. The result shows that with combining two types of approaches, the model could identify 95% all nodes that contain the main content.
KW - naïve Bayes
KW - template-based
KW - web content extractor
UR - http://www.scopus.com/inward/record.url?scp=85187549223&partnerID=8YFLogxK
U2 - 10.1109/ICSGTEIS60500.2023.10424080
DO - 10.1109/ICSGTEIS60500.2023.10424080
M3 - Conference contribution
AN - SCOPUS:85187549223
T3 - Proceedings - International Conference on Smart-Green Technology in Electrical and Information Systems, ICSGTEIS
SP - 41
EP - 46
BT - ICSGTEIS 2023 - 2023 International Conference on Smart-Green Technology in Electrical and Information Systems
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2023 International Conference on Smart-Green Technology in Electrical and Information Systems, ICSGTEIS 2023
Y2 - 2 November 2023 through 4 November 2023
ER -