A hierarchical predictor of synthetic speech naturalness using neural networks

Mon, 06/27/2016 - 08:42 — ghenter

Title	A hierarchical predictor of synthetic speech naturalness using neural networks
Publication Type	Conference Paper
	2016
Authors	Yoshimura, T, Henter, GEje, Watts, O, Wester, M, Yamagishi, J, Tokuda, K
Conference Name	Proc. Interspeech
Date Published	September
Publisher	ISCA
Conference Location	San Francisco, CA
	Blizzard Challenge, naturalness, neural network, speech synthesis
	A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is trained on the result of large-scale subjective evaluations employing many human listeners, i.e., the Blizzard Challenge. To exploit the data, we experiment with linear regression, feed-forward and convolutional neural network models, and combinations of them to regress from synthetic speech to the perceptual scores obtained from listeners. The biggest improvements were seen when combining stimulus- and system-level predictions.
Refereed Designation	Refereed

Main menu