On development deep neural network speech synthesis using vector quantized acoustical feature for isolated bahasa Indonesia words

Trikarsa Tirtadwipa Manunggal, Dhany Arifianto

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

1 Citation (Scopus)

Abstract

Speech representation and transformation using adaptive interpolation of weighted spectrum (STRAIGHT) is well known as high quality vocoder-and-synthesizer method for both voice recognition and speech synthesis. Especially on speech synthesis, choosing STRAIGHT is reasonable scheme to produce good speech sound. A problem appears when relating linguistic feature and STRAIGHT with high dimensional acoustical feature that is modelled by the neural network. The computational cost rises too high and becomes inefficient to be used as neural network output feature. VQ approximates aperiodicity and smoothed spectrum parameter on to a numbers of centroid vectors. This paper examines scenario to reduce the computational cost yet the quality remains good using vector quantization (VQ) method. Experimental results show that VQ based Speech Synthesis produce nearly inaudible distortion with DMOS at about 3.82 on the synthesized speech.

Original languageEnglish
Title of host publication2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages105-109
Number of pages5
ISBN (Electronic)9781509035168
DOIs
Publication statusPublished - 3 May 2017
Event19th Annual Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016 - Bali, Indonesia
Duration: 26 Oct 201628 Oct 2016

Publication series

Name2016 Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016

Conference

Conference19th Annual Conference of the Oriental Chapter of International Committee for Coordination and Standardization of Speech Databases and Assessment Techniques, O-COCOSDA 2016
Country/TerritoryIndonesia
CityBali
Period26/10/1628/10/16

Keywords

  • bahasa
  • deep neural network
  • speech synthesis
  • vector quantization

Fingerprint

Dive into the research topics of 'On development deep neural network speech synthesis using vector quantized acoustical feature for isolated bahasa Indonesia words'. Together they form a unique fingerprint.

Cite this