You are here

The Listening Talker Workshop

Edinburgh hosted the Listening Talker workshop during 2-3 May.  Junichi Yamagashi gave an invited talk, HMM-based speech synthesis adapted to listeners' and talkers' conditions.


Abstract  It is known that the intelligibility of state-of-the-art hidden Markov model (HMM) generated synthetic speech can be comparable to natural speech in clean environments. However, the situation is quite different if the listener's and/or talker's condition differ. If the environment of the listener is noisy, most often natural speech is still more intelligible than synthetic speech. If the condition of the talker is disordered due to vocal disabilities such as neurological degenerative diseases, the talker's speech may be unintelligible even in clean environments.

In this talk, we introduce our recent approaches to these problems. To improve the intelligibility of synthetic speech in noise, we have proposed two promising approches based on statistical modelling and signal processing. In the former statistical modelling approach, we use speech waveforms and articulatory movements recorded in parallel by electromagnetic articulography and try to create hyper-articulated speech from normal speech by manipulating articulatory movements predicted from HMM [1]. The latter signal processing approach is a new cepstral analysis and transformation method [2] based on an objective intelligibility measure for speech in noise, the Glimpse Proportion measure [3]. This new method aims to modify the spectral envelope of speech in order to increase the intelligibility of speech in noise by modifying the clean speech. Finally we mention other work, in which we create natural and intelligible synthetic voices even from disordered unintelligible speech of individuals suffering from motor neurone disease [4].

  • [1] Z-H. Ling, K. Richmond, J. Yamagishi, and R.-H. Wang "Integrating Articulatory Features into HMM-based Parametric Speech Synthesis," IEEE Audio, Speech, & Language Processing. vol.17 No.6 pp.1171-1185 August 2009
  • [2] C. Valentini-Botinhao, R. Maia, J. Yamagishi, S. King, and H. Zen, "Cepstral analysis based on the Glimpse proportion measure for improving the intelligibility of HMM-based synthetic speech in noise," Proc ICASSP 2012
  • [3] M.Cooke,"A glimpsing model of speech perception in noise," J. Acoust. Soc. Am., vol. 119, no.3, pp. 1562–1573, 2006.
  • [4] J. Yamagishi, C. Veaux, S. King and S. Renals, "Speech synthesis technologies for individuals with vocal disabilities: voice banking and reconstruction," invited review. Acoustical Science & Technology, vol. 33, pp.1-5, January 2012