Abstract

One of the most common methods used in the process of identifying speakers is the Gaussian Mixture Model (GMM) method. The quality of GMM depends on the method selected to train the Gaussian. One method that the researcher has chosen is to use k-Means. In this study, an evaluation process was performed on the k-Means GMM using three centroid initialization methods: randomization, seeding and density analysis. The application of seeding uses the k-Means method, whereas the application of density analysis uses the histogram method. We applied two evaluation criteria, namely the complexity of the training process and the accuracy of the speaker identification process. Experiments were conducted over three types of voice test duration: 2, 4 and 6 seconds. We also used nine types of Gaussian components, ranging from 4 to 20 components, with an increasing scale of 2+n. Our proposed method using density analysis has a clustering process time of 33.7% lower, but with the highest accuracy of 95.5%.

Original languageEnglish
Title of host publicationProceeding - ICERA 2021
Subtitle of host publication2021 3rd International Conference on Electronics Representation and Algorithm
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages95-98
Number of pages4
ISBN (Electronic)9781665434003
DOIs
Publication statusPublished - 29 Jul 2021
Event3rd International Conference on Electronics Representation and Algorithm, ICERA 2021 - Virtual, Yogyakarta, Indonesia
Duration: 29 Jul 2021 → …

Publication series

NameProceeding - ICERA 2021: 2021 3rd International Conference on Electronics Representation and Algorithm

Conference

Conference3rd International Conference on Electronics Representation and Algorithm, ICERA 2021
Country/TerritoryIndonesia
CityVirtual, Yogyakarta
Period29/07/21 → …

Keywords

  • clustering algorithms
  • gaussian mixture model
  • histograms
  • k-means
  • speaker recognition

Fingerprint

Dive into the research topics of 'GMM Performance Evaluation through Centroid Initialization of k-Means in Text-Independent Speaker Identification'. Together they form a unique fingerprint.

Cite this