TY - JOUR
T1 - Gamelan instrument sound recognition using spectral and facial features of the first harmonic frequency
AU - Tjahyanto, Aris
AU - Wulandari, Diah Puspito
AU - Suprapto, Yoyon K.
AU - Purnomo, Mauridhi Hery
N1 - Publisher Copyright:
© 2015 The Acoustical Society of Japan.
PY - 2015
Y1 - 2015
N2 - Principal component and spectral-based feature sets were applied to the recognition of gamelan instrument sounds using support vector machines (SVMs). The principal components were calculated on the basis of a segmented scalogram from the first harmonic frequency of the gamelan recordings. The segmented scalogram is assumed as a ''facial image'' of the gamelan instrument sound in a frontal pose, neutral expression, and normal lighting. The scalogram was computed from the gamelan sound signal using a continuous wavelet transform (CWT). The performance and contribution of the principal component and spectral-based features were compared using an F-measure. For the training phase, the feature sets were extracted from isolated tones that were recorded over the entire frequency range of four gamelan instruments (demung, saron, peking, and bonang families). Using 90%/10% splits between the training and validating data sets, model classifiers were constructed from the radial basis function (RBF) kernel SVM. The classifiers are composed of 28 separate One-Against- One multiclass classifiers. The experiment showed that the spectral-based feature set shows an average F-measure of 74.05% and the appearance-based feature yields 71.87%. For saron-only note tracking, the spectral-based feature set had an F-measure of 83.79%, higher than the demung-only note tracking, which yielded 63.89%.
AB - Principal component and spectral-based feature sets were applied to the recognition of gamelan instrument sounds using support vector machines (SVMs). The principal components were calculated on the basis of a segmented scalogram from the first harmonic frequency of the gamelan recordings. The segmented scalogram is assumed as a ''facial image'' of the gamelan instrument sound in a frontal pose, neutral expression, and normal lighting. The scalogram was computed from the gamelan sound signal using a continuous wavelet transform (CWT). The performance and contribution of the principal component and spectral-based features were compared using an F-measure. For the training phase, the feature sets were extracted from isolated tones that were recorded over the entire frequency range of four gamelan instruments (demung, saron, peking, and bonang families). Using 90%/10% splits between the training and validating data sets, model classifiers were constructed from the radial basis function (RBF) kernel SVM. The classifiers are composed of 28 separate One-Against- One multiclass classifiers. The experiment showed that the spectral-based feature set shows an average F-measure of 74.05% and the appearance-based feature yields 71.87%. For saron-only note tracking, the spectral-based feature set had an F-measure of 83.79%, higher than the demung-only note tracking, which yielded 63.89%.
KW - Automatic transcription
KW - Support vector machines
KW - Wavelet transform
UR - http://www.scopus.com/inward/record.url?scp=84920381221&partnerID=8YFLogxK
U2 - 10.1250/ast.36.12
DO - 10.1250/ast.36.12
M3 - Article
AN - SCOPUS:84920381221
SN - 1346-3969
VL - 36
SP - 12
EP - 23
JO - Acoustical Science and Technology
JF - Acoustical Science and Technology
IS - 1
ER -