You are here

A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis

Tue, 09/29/2015 - 03:03 — cassiavb

Title	A Deep Generative Architecture for Postfiltering in Statistical Parametric Speech Synthesis
Publication Type	Journal Article
Year of Publication	2015
Authors	Chen, L-H, Raitio, T, Valentini-Botinhao, C, Ling, Z, Yamagishi, J
Journal	Audio, Speech, and Language Processing, IEEE/ACM Transactions on
Volume	23
Pagination	2003-2014
ISSN	2329-9290
Keywords	deep generative architecture, HMM, modulation spectrum, postfilter, segmental quality, speech synthesis
Abstract	The generated speech of hidden Markov model (HMM)-based statistical parametric speech synthesis still sounds muffled. One cause of this degradation in speech quality may be the loss of fine spectral structures. In this paper, we propose to use a deep generative architecture, a deep neural network (DNN) generatively trained, as a postfilter. The network models the conditional probability of the spectrum of natural speech given that of synthetic speech to compensate for such gap between synthetic and natural speech. The proposed probabilistic postfilter is generatively trained by cascading two restricted Boltzmann machines (RBMs) or deep belief networks (DBNs) with one bidirectional associative memory (BAM). We devised two types of DNN postfilters: one operating in the mel-cepstral domain and the other in the higher dimensional spectral domain. We compare these two new data-driven postfilters with other types of postfilters that are currently used in speech synthesis: a fixed mel-cepstral based postfilter, the global variance based parameter generation, and the modulation spectrum-based enhancement. Subjective evaluations using the synthetic voices of a male and female speaker confirmed that the proposed DNN-based postfilter in the spectral domain significantly improved the segmental quality of synthetic speech compared to that with conventional methods.
DOI	10.1109/TASLP.2015.2461448

Log in or register to post comments
Google Scholar
DOI
BibTex
RTF