Gamelan instrument sound recognition using spectral and facial features of the first harmonic frequency

Aris Tjahyanto*, Diah Puspito Wulandari, Yoyon K. Suprapto, Mauridhi Hery Purnomo

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

6 Citations (Scopus)

Abstract

Principal component and spectral-based feature sets were applied to the recognition of gamelan instrument sounds using support vector machines (SVMs). The principal components were calculated on the basis of a segmented scalogram from the first harmonic frequency of the gamelan recordings. The segmented scalogram is assumed as a ''facial image'' of the gamelan instrument sound in a frontal pose, neutral expression, and normal lighting. The scalogram was computed from the gamelan sound signal using a continuous wavelet transform (CWT). The performance and contribution of the principal component and spectral-based features were compared using an F-measure. For the training phase, the feature sets were extracted from isolated tones that were recorded over the entire frequency range of four gamelan instruments (demung, saron, peking, and bonang families). Using 90%/10% splits between the training and validating data sets, model classifiers were constructed from the radial basis function (RBF) kernel SVM. The classifiers are composed of 28 separate One-Against- One multiclass classifiers. The experiment showed that the spectral-based feature set shows an average F-measure of 74.05% and the appearance-based feature yields 71.87%. For saron-only note tracking, the spectral-based feature set had an F-measure of 83.79%, higher than the demung-only note tracking, which yielded 63.89%.

Original languageEnglish
Pages (from-to)12-23
Number of pages12
JournalAcoustical Science and Technology
Volume36
Issue number1
DOIs
Publication statusPublished - 2015

Keywords

  • Automatic transcription
  • Support vector machines
  • Wavelet transform

Fingerprint

Dive into the research topics of 'Gamelan instrument sound recognition using spectral and facial features of the first harmonic frequency'. Together they form a unique fingerprint.

Cite this