TY - JOUR
T1 - Optimization of machine learning algorithms for predicting infected COVID-19 in isolated DNA
AU - Al Kindhi, Berlian
N1 - Publisher Copyright:
© 2020, Intelligent Network and Systems Society.
PY - 2020
Y1 - 2020
N2 - The stipulation of the COVID-19 (Corona Virus Disease 2019) as a global pandemic by the WHO (World Health Organization) made a number of countries lockdown. Countries like Italy, Denmark, China, and Ireland have taken lockdown steps to prevent this disease from spreading and taking many lives. COVID-19, SARS (Severe Acute Respiratory Syndrome), and MERS (Middle-East Respiratory Syndrome) are viral infections in the respiratory tract that can be fatal. SARS first became an epidemic in China in 2002, while MERS first appeared in the Middle East in 2012. At the end of 2019, a new disease appeared in China called COVID-19. These three viruses are still in the same family so they have very similar nucleotide sequences. The tested COVID-19 primer was able to adhere well with a similarity level of more than 70% in all DNA SARS and MERS isolates tested. To distinguish DNA samples between MERS, SARS, and COVID-19 using the basic local alignment sequence nucleotide approach alone is not enough. We propose an optimization of machine learning methods to predict the COVID-19, the optimization method depends on the method we improved. In Discriminant Analysis, we use Wilks Lamda's approach and change Linear into Diagonal Discriminant Matrix. In the Decision Tree method, we make optimization by making gain formulation to minimize the entropy value to get more information on the result. We optimized K-NN with add weighted distance optimization, and in SVM we try several kernels and optimize the hyperplane with SRM (Structural Risk Minimization) approach to looking for the best result. Besides that, in preparation for input features, we use Edit Levenshtein Method with the calculation of the optimum similarity from each DNA sequence. The results of our test, optimization of the Decision Tree method produces an accuracy of 98.3%, optimization of Discriminant Analysis 98.3%, and optimization of SVM and KNN 100% respectively. We also find a fact in the DNA Alignment process, when the primer being compared is 'R', the nucleotides in the COVID-19 sample data are always 'A' and this approach from the bioinformatic side can be used as analytical material in the medical world.
AB - The stipulation of the COVID-19 (Corona Virus Disease 2019) as a global pandemic by the WHO (World Health Organization) made a number of countries lockdown. Countries like Italy, Denmark, China, and Ireland have taken lockdown steps to prevent this disease from spreading and taking many lives. COVID-19, SARS (Severe Acute Respiratory Syndrome), and MERS (Middle-East Respiratory Syndrome) are viral infections in the respiratory tract that can be fatal. SARS first became an epidemic in China in 2002, while MERS first appeared in the Middle East in 2012. At the end of 2019, a new disease appeared in China called COVID-19. These three viruses are still in the same family so they have very similar nucleotide sequences. The tested COVID-19 primer was able to adhere well with a similarity level of more than 70% in all DNA SARS and MERS isolates tested. To distinguish DNA samples between MERS, SARS, and COVID-19 using the basic local alignment sequence nucleotide approach alone is not enough. We propose an optimization of machine learning methods to predict the COVID-19, the optimization method depends on the method we improved. In Discriminant Analysis, we use Wilks Lamda's approach and change Linear into Diagonal Discriminant Matrix. In the Decision Tree method, we make optimization by making gain formulation to minimize the entropy value to get more information on the result. We optimized K-NN with add weighted distance optimization, and in SVM we try several kernels and optimize the hyperplane with SRM (Structural Risk Minimization) approach to looking for the best result. Besides that, in preparation for input features, we use Edit Levenshtein Method with the calculation of the optimum similarity from each DNA sequence. The results of our test, optimization of the Decision Tree method produces an accuracy of 98.3%, optimization of Discriminant Analysis 98.3%, and optimization of SVM and KNN 100% respectively. We also find a fact in the DNA Alignment process, when the primer being compared is 'R', the nucleotides in the COVID-19 sample data are always 'A' and this approach from the bioinformatic side can be used as analytical material in the medical world.
KW - COVID-19
KW - DNA
KW - Decision tree
KW - Discriminant analysis
KW - K-NN
KW - SVM
UR - http://www.scopus.com/inward/record.url?scp=85089546394&partnerID=8YFLogxK
U2 - 10.22266/IJIES2020.0831.37
DO - 10.22266/IJIES2020.0831.37
M3 - Article
AN - SCOPUS:85089546394
SN - 2185-310X
VL - 13
SP - 423
EP - 433
JO - International Journal of Intelligent Engineering and Systems
JF - International Journal of Intelligent Engineering and Systems
IS - 4
ER -