Drugs Clustering Based on Their Compositions Using Word2Vec and K-means Clustering

Rahmat Hidayat*, Nur Aini Rakhmawati

*Corresponding author for this work

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

With the rapid growth in medicine, it is essential to determine a method of cluster drug composition data to make it easy for industries to define medicine composition. K-means clustering is one way to cluster the composition of drugs. In this paper, we use the Word2Vec model and convert the composition of the drug into a vector. We cluster it using K-means, also visualize the data results of the clustering. In Word2Vec, we use two methods, namely CBOW and SG. Meanwhile, in K-means, we determine the number of centroids using the Elbow Criterion and Silhouette Coefficient method. Datasets consist of more than 250 product names of drug from Farmaku and K24. The experiment results show that the Silhouette Coefficient value using the CBOW and SG methods are 0.901 and 0.877. Both CBOW and SG method generating the best value of the number of clusters is three.

Original languageEnglish
Title of host publication2021 2nd International Conference on Information Technology, Advanced Mechanical and Electrical Engineering, ICITAMEE 2021
EditorsYessi Jusman, Hanifah Rahmi Fajrin
PublisherAmerican Institute of Physics Inc.
ISBN (Electronic)9780735442863
DOIs
Publication statusPublished - 30 Nov 2022
Event2nd International Conference on Information Technology, Advanced Mechanical and Electrical Engineering, ICITAMEE 2021 - Yogyakarta, Indonesia
Duration: 25 Aug 202126 Aug 2021

Publication series

NameAIP Conference Proceedings
Volume2499
ISSN (Print)0094-243X
ISSN (Electronic)1551-7616

Conference

Conference2nd International Conference on Information Technology, Advanced Mechanical and Electrical Engineering, ICITAMEE 2021
Country/TerritoryIndonesia
CityYogyakarta
Period25/08/2126/08/21

Keywords

  • cbow
  • drugs clustering
  • elbow criterion
  • k-means clustering
  • silhouette coefficient
  • skip-gram
  • word2vec

Fingerprint

Dive into the research topics of 'Drugs Clustering Based on Their Compositions Using Word2Vec and K-means Clustering'. Together they form a unique fingerprint.

Cite this