TY - GEN
T1 - HMM-based speech synthesis system with expressive Indonesian speech corpus
AU - Anggrayni, Elok
AU - Arifianto, Dhany
N1 - Publisher Copyright:
© 2019 Proceedings of the International Congress on Acoustics. All rights reserved.
PY - 2019
Y1 - 2019
N2 - In this paper, we present a result of HMM-based speech synthesis system applied to Indonesian expressive speech scorpus. The purpose is to observe speech quality of synthesized speech, conversely. Firstly, we selected expressive Indonesian conversation from movie, novel, and drama transcript. We developed speech database based on phonetically balanced sentence set in which consist of 33 Indonesian phonemes and its IPA symbols and formed 655 sentences. Three expressive styles were applied, namely happiness, sadness, and anger. We hired four professional theater artist to utter the sentences. Segmentation and labeling was performed by manual to create transcription. Variation is given in kind of expressive style and training data amount. The expressive style-dependent decision trees achieve prosodic conversion. The objective and subjective evaluation process are also analyzed. In objective test is using MCD method earn the best score for happiness expressive style with score 4.2 in 82 training data. Then for sadness with score 5.13 in 81 training data and 5.18 for anger in 80 training data. Subjective test with Mean Opinion Score obtain naturalness for happiness, anger, and sadness with score 3.51, 3.38, and 3,0, respectively. The result shown that quality of the synthetic speech is high in term of naturalness.
AB - In this paper, we present a result of HMM-based speech synthesis system applied to Indonesian expressive speech scorpus. The purpose is to observe speech quality of synthesized speech, conversely. Firstly, we selected expressive Indonesian conversation from movie, novel, and drama transcript. We developed speech database based on phonetically balanced sentence set in which consist of 33 Indonesian phonemes and its IPA symbols and formed 655 sentences. Three expressive styles were applied, namely happiness, sadness, and anger. We hired four professional theater artist to utter the sentences. Segmentation and labeling was performed by manual to create transcription. Variation is given in kind of expressive style and training data amount. The expressive style-dependent decision trees achieve prosodic conversion. The objective and subjective evaluation process are also analyzed. In objective test is using MCD method earn the best score for happiness expressive style with score 4.2 in 82 training data. Then for sadness with score 5.13 in 81 training data and 5.18 for anger in 80 training data. Subjective test with Mean Opinion Score obtain naturalness for happiness, anger, and sadness with score 3.51, 3.38, and 3,0, respectively. The result shown that quality of the synthetic speech is high in term of naturalness.
KW - Expressive Indonesian speech scorpus
KW - HMM
KW - Speech synthesis
UR - http://www.scopus.com/inward/record.url?scp=85099329366&partnerID=8YFLogxK
U2 - 10.18154/RWTH-CONV-239574
DO - 10.18154/RWTH-CONV-239574
M3 - Conference contribution
AN - SCOPUS:85099329366
T3 - Proceedings of the International Congress on Acoustics
SP - 6203
EP - 6210
BT - Proceedings of the 23rd International Congress on Acoustics
A2 - Ochmann, Martin
A2 - Michael, Vorlander
A2 - Fels, Janina
PB - International Commission for Acoustics (ICA)
T2 - 23rd International Congress on Acoustics: Integrating 4th EAA Euroregio, ICA 2019
Y2 - 9 September 2019 through 23 September 2019
ER -