TY - JOUR
T1 - Data Augmentation Technique Using Two Step SMOTE for Electronic-nose Signal in Breath Ketone Level Detection
AU - Firmansyah, Dhiza Wahyu
AU - Sarno, Riyanarto
N1 - Publisher Copyright:
© 2023, International Journal of Intelligent Engineering and Systems. All Rights Reserved.
PY - 2023
Y1 - 2023
N2 - Breath acetone concentrations were found to be correlated with blood ketone levels. Based on this evidence, predicting blood ketone levels using breath analysis and machine learning (ML) becomes possible. Nevertheless, a good ML model requires a large amount of training data. Under certain conditions, it is difficult to collect large amounts of data such as during the Covid-19 pandemic. To overcome this problem, we propose an augmentation technique to extend the number of training datasets using two step synthetic minority oversampling (SMOTE). The first step was to increase the amount of training data by combining it with synthetic data, while the second step was to balance the data at each ketone level. The strategy for using SMOTE with regression was further explained since this study aims to predict ketone levels with numerical output values and SMOTE is typically used in classification cases. The proposed method was evaluated by entering the data into several ML methods such as deep neural network regression (DNN-R), linear regression (ML-R), ransac regression (RC-R), K-nearest neighbour regression (KNN-R), decision tree regression (DT-R), random forest regression (RF-R), Ada boost regression (AD-R), Gradient boost regression (GB-R) and XG-boost regression (XGB-R). Based on the test results, when compared without the proposed method, an increase in accuracy was obtained on DNN-R, ML-R, RC-R, KNN-R, DT-R, RF-R, AB-R, GB-R, and XGB-R by 0.958%, 9.51%, 35.74%, 18.133%, 8.236%, 11.348, 9.47%, 5.093%, and 11.264% respectively.
AB - Breath acetone concentrations were found to be correlated with blood ketone levels. Based on this evidence, predicting blood ketone levels using breath analysis and machine learning (ML) becomes possible. Nevertheless, a good ML model requires a large amount of training data. Under certain conditions, it is difficult to collect large amounts of data such as during the Covid-19 pandemic. To overcome this problem, we propose an augmentation technique to extend the number of training datasets using two step synthetic minority oversampling (SMOTE). The first step was to increase the amount of training data by combining it with synthetic data, while the second step was to balance the data at each ketone level. The strategy for using SMOTE with regression was further explained since this study aims to predict ketone levels with numerical output values and SMOTE is typically used in classification cases. The proposed method was evaluated by entering the data into several ML methods such as deep neural network regression (DNN-R), linear regression (ML-R), ransac regression (RC-R), K-nearest neighbour regression (KNN-R), decision tree regression (DT-R), random forest regression (RF-R), Ada boost regression (AD-R), Gradient boost regression (GB-R) and XG-boost regression (XGB-R). Based on the test results, when compared without the proposed method, an increase in accuracy was obtained on DNN-R, ML-R, RC-R, KNN-R, DT-R, RF-R, AB-R, GB-R, and XGB-R by 0.958%, 9.51%, 35.74%, 18.133%, 8.236%, 11.348, 9.47%, 5.093%, and 11.264% respectively.
KW - Breath ketone level
KW - Data augmentation
KW - Electronic-nose
KW - Gas sensor
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85164623662&partnerID=8YFLogxK
U2 - 10.22266/ijies2023.0831.42
DO - 10.22266/ijies2023.0831.42
M3 - Article
AN - SCOPUS:85164623662
SN - 2185-310X
VL - 16
SP - 523
EP - 536
JO - International Journal of Intelligent Engineering and Systems
JF - International Journal of Intelligent Engineering and Systems
IS - 4
ER -