You are here

A hierarchical predictor of synthetic speech naturalness using neural networks

TitleA hierarchical predictor of synthetic speech naturalness using neural networks
Publication TypeConference Paper
2016
AuthorsYoshimura, T, Henter, GEje, Watts, O, Wester, M, Yamagishi, J, Tokuda, K
Conference NameProc. Interspeech
Date PublishedSeptember
PublisherISCA
Conference LocationSan Francisco, CA
Blizzard Challenge, naturalness, neural network, speech synthesis

A problem when developing and tuning speech synthesis systems is that there is no well-established method of automatically rating the quality of the synthetic speech. This research attempts to obtain a new automated measure which is trained on the result of large-scale subjective evaluations employing many human listeners, i.e., the Blizzard Challenge. To exploit the data, we experiment with linear regression, feed-forward and convolutional neural network models, and combinations of them to regress from synthetic speech to the perceptual scores obtained from listeners. The biggest improvements were seen when combining stimulus- and system-level predictions.

Refereed DesignationRefereed