Heng Lu is currently working on two topics in NST :
1) Dynamic Bayesian Network (DBN) based Speech Synthesis
Context information is widely use in HMM-based speech synthesis via context-dependent phone models. However, full-context space is extremely large and decision tree based context clustering is not a trivial task. On many occasions, MDL based decision tree clustering is found not to be the optimum one. However, Dynamic Bayesian Networks provide a factorized framework to model context-dependent models. Dynamic Bayesian Networks are graphical models and the HMM is one special case of DBN. What is more, unlike in HMM based speech synthesis where samples with different context information are clustered in one cluster, in DBN, relations among context information can be expressed explicitly.
Currently, DBN based Speech Synthesis is learned in two steps :
2) Deep Neural Network (DNN) based Speech Synthesis
DNN is a very hot topic in speech recognition now. For speech synthesis, we can also use Neural Networks for test model parameter or test acoustic parameter prediction. In this work, we try to use DNN to replace decision trees, and establish a mapping from labels to test model feature/acoustic parameters directly. Singular value decomposition (SVD) based (Letter to a vector a continuous value) Front-End is employed in this case, and no extra label information is needed apart from text.