One treatment for cancer that is widely used is radiation therapy or radiotherapy using compounds that kill cancer cells.The effectiveness of the radio theraphy is assessed from the percentage of cancer cell death rate. This research examines 84 compounds where each compound is composed by 217 features leading to high dimensionality of the data. Feature selection is carried out based on the mean value of Gini (MDG) and it is able to sort the most important features used in the classification using Naive Bayes. The Naive Bayes has a weak performance to classify the raw dataset i.e. using threshold of 10% cancer cell death rate. A grouping based on mixture distribution found 30% cell death rate as a new threshold, and it improves the performance of Naive Bayes both in training and testing dataset evaluated using AUC (Area Under Curve). The optimal classification for testing dataset is obtained by using either 20% or 25% most important features with AUC close to 60%, where it is about 15% higher than classification using threshold of 10%. Meanwhile, the AUC of training dataset reached more than 70%.
|Number of pages||18|
|Journal||International Journal of Artificial Intelligence|
|Publication status||Published - 1 Mar 2019|
- Naive bayes