TY - GEN
T1 - Text Mining in Healthcare for Disease Classification using Machine Learning Algorithm
AU - Buntoro, Ghulam Asrofi
AU - Wibawa, Adhi Dharma
AU - Purnomo, Mauridhi Hery
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/9/29
Y1 - 2021/9/29
N2 - The development of information technology and smartphones has caused production of many data around us. In every second million of new data is created in the form of text, audio, image and even videos. This environment then has triggered big data analytics demand. One of big data that is produced daily is data on the history of healthcare services in hospitals. Important new information can be retrieved through this huge dataset, especially concerning the patient symptoms, drug usage and new diseases report. In this study, text processing technique is applied on text data of patient medical record data from public hospital during 2017 till 2019 regarding the patient symptoms and the disease classification. Naïve Bayes Classifier and Random Forest algorithms are used to classify diseases in medical record data with 19 diseases in preprocessing data. A list of modified Indonesian stop words was used to filter the symptom sentences. The result indicates that the Random Forest classification algorithm can achieve the highest accuracy of around 99.9%, better and more accurate than the Naïve Bayes classification algorithm. This experiment shows that our proposed method provides a robust system and good accuracy for classifying medical record data with many diseases.
AB - The development of information technology and smartphones has caused production of many data around us. In every second million of new data is created in the form of text, audio, image and even videos. This environment then has triggered big data analytics demand. One of big data that is produced daily is data on the history of healthcare services in hospitals. Important new information can be retrieved through this huge dataset, especially concerning the patient symptoms, drug usage and new diseases report. In this study, text processing technique is applied on text data of patient medical record data from public hospital during 2017 till 2019 regarding the patient symptoms and the disease classification. Naïve Bayes Classifier and Random Forest algorithms are used to classify diseases in medical record data with 19 diseases in preprocessing data. A list of modified Indonesian stop words was used to filter the symptom sentences. The result indicates that the Random Forest classification algorithm can achieve the highest accuracy of around 99.9%, better and more accurate than the Naïve Bayes classification algorithm. This experiment shows that our proposed method provides a robust system and good accuracy for classifying medical record data with many diseases.
KW - disease
KW - healthcare
KW - naïve Bayes classification
KW - random forest
KW - text mining
UR - https://www.scopus.com/pages/publications/85119952361
U2 - 10.1109/IES53407.2021.9593998
DO - 10.1109/IES53407.2021.9593998
M3 - Conference contribution
AN - SCOPUS:85119952361
T3 - International Electronics Symposium 2021: Wireless Technologies and Intelligent Systems for Better Human Lives, IES 2021 - Proceedings
SP - 97
EP - 101
BT - International Electronics Symposium 2021
A2 - Yunanto, Andhik Ampuh
A2 - Kusuma N, Artiarini
A2 - Hermawan, Hendhi
A2 - Putra, Putu Agus Mahadi
A2 - Gamar, Farida
A2 - Ridwan, Mohamad
A2 - Prayogi, Yanuar Risah
A2 - Ruswiansari, Maretha
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 23rd International Electronics Symposium, IES 2021
Y2 - 29 September 2021 through 30 September 2021
ER -