Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech

Mon, 03/16/2015 - 12:48 — tmerritt

Title	Attributing modelling errors in HMM synthesis by stepping gradually from natural to modelled speech
Publication Type	Conference Paper
	2015
Authors	Merritt, T, Latorre, J, King, S
Conference Name	Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)
Conference Location	Brisbane
	Even the best statistical parametric speech synthesis systems do not achieve the naturalness of good unit selection. We investigated possible causes of this. By constructing speech signals that lie inbetween natural speech and the output from a complete HMM synthesis system, we investigated various effects of modelling. We manipulated the temporal smoothness and the variance of the spectral parameters to create stimuli, then presented these to listeners alongside natural and vocoded speech, as well as output from a full HMM-based text-to-speech system and from an idealised `pseudo-HMM'. All speech signals, except the natural waveform, were created using vocoders employing one of two popular spectral parameterisations: Mel-Cepstra or Mel-Line Spectral Pairs. Listeners made `same or different' pairwise judgements, from which we generated a perceptual map using Multidimensional Scaling. We draw conclusions about which aspects of HMM synthesis are limiting the naturalness of the synthetic speech.

Main menu