Optimization of machine learning algorithms for predicting infected COVID-19 in isolated DNA

Berlian Al Kindhi*

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

7 Citations (Scopus)

Abstract

The stipulation of the COVID-19 (Corona Virus Disease 2019) as a global pandemic by the WHO (World Health Organization) made a number of countries lockdown. Countries like Italy, Denmark, China, and Ireland have taken lockdown steps to prevent this disease from spreading and taking many lives. COVID-19, SARS (Severe Acute Respiratory Syndrome), and MERS (Middle-East Respiratory Syndrome) are viral infections in the respiratory tract that can be fatal. SARS first became an epidemic in China in 2002, while MERS first appeared in the Middle East in 2012. At the end of 2019, a new disease appeared in China called COVID-19. These three viruses are still in the same family so they have very similar nucleotide sequences. The tested COVID-19 primer was able to adhere well with a similarity level of more than 70% in all DNA SARS and MERS isolates tested. To distinguish DNA samples between MERS, SARS, and COVID-19 using the basic local alignment sequence nucleotide approach alone is not enough. We propose an optimization of machine learning methods to predict the COVID-19, the optimization method depends on the method we improved. In Discriminant Analysis, we use Wilks Lamda's approach and change Linear into Diagonal Discriminant Matrix. In the Decision Tree method, we make optimization by making gain formulation to minimize the entropy value to get more information on the result. We optimized K-NN with add weighted distance optimization, and in SVM we try several kernels and optimize the hyperplane with SRM (Structural Risk Minimization) approach to looking for the best result. Besides that, in preparation for input features, we use Edit Levenshtein Method with the calculation of the optimum similarity from each DNA sequence. The results of our test, optimization of the Decision Tree method produces an accuracy of 98.3%, optimization of Discriminant Analysis 98.3%, and optimization of SVM and KNN 100% respectively. We also find a fact in the DNA Alignment process, when the primer being compared is 'R', the nucleotides in the COVID-19 sample data are always 'A' and this approach from the bioinformatic side can be used as analytical material in the medical world.

Original languageEnglish
Pages (from-to)423-433
Number of pages11
JournalInternational Journal of Intelligent Engineering and Systems
Volume13
Issue number4
DOIs
Publication statusPublished - 2020

Keywords

  • COVID-19
  • DNA
  • Decision tree
  • Discriminant analysis
  • K-NN
  • SVM

Fingerprint

Dive into the research topics of 'Optimization of machine learning algorithms for predicting infected COVID-19 in isolated DNA'. Together they form a unique fingerprint.

Cite this