You are here

Liang Lu research description

My research focuses on speech recognition using subspace Gaussian mixture models (SGMMs) which was proposed by Dan Povey, et al a couple of years ago. The basic idea of this model is that the HMM state dependent GMM parameters are derived from relatively low dimensional model subspace, opposed to direct estimation in conventional GMM-based acoustic models. The model subspace is expected to capture the correlations among different states and factorise the acoustic variability into different model subspaces. Another advantage is that the number of state-dependent model parameters would be relatively small in SGMMs, which makes this model attractive with limited amount of training data.

In addition, since the model subspace is globally shared which does not depend on the HMM topology, it has been shown that the model is particularly effective for multilingual and cross-lingual acoustic modelling. In this case, the model subspace can be tied across multiple language systems without considering the phoneme unit mismatch among the languages. The accuracy of each source language system may be improved since the amount of training data is increased. Furthermore, the model
subspace can be ported directly into some other target language system with very limited training data. The model subspace can be fixed and only the state dependent parameters should be estimated to build the target language system. Our results indicate that the improvement is remarkable since the total number of dependent parameters is very small.

Recently, we have started to work on noise compensation for SGMM-based acoustic models. We used the joint uncertainty decoding (JUD) noise compensation technique proposed by Hank Liao and Mark Gales. We found that JUD can be successfully applied to SGMM acoustic models, leading to good results in terms of both high recognition accuracy and low computational cost. Future work would be on compensation in log-spectral domain to further reduce the computational cost and noise adaptive training to improve the accuracy.