I currently focus on the research of speech transcription systems. Three main topics are included:
1. Diverse data structuring, which mainly focusses on:
Examining the nature of BBC multi-genre diverse datasets which come from different resources
Until now, we have received BBC data of radios, TV drama, the Reith lectures, Desert Island Discs and Archives (diverse domain coverage). Some of them have complete transcriptions or subtitles while others have not. Preliminary speech transcription systems are trained separately using BBC transcriptions or lightly supervised decoding outputs for each genre of data which covers different domains.
Metadata generation.
Based on the observations of the preliminary transcription systems, different natures are found for different data sources. Automatic approaches are used to generate metadata for further research of both speech transcription and synthesis, such as segmentation, speaker-id, lightly supervised decoding transcriptions, confidence measures, automatically detected disfluencies/filled pauses, et.al.
2. Semi-supervised acoustic model training.
This topic is motivated by the diverse data structuring for training data with incomplete labelling. The work is to build strong acoustic models by developing approaches to properly incorporate both of the incomplete manual transcriptions and lightly supervised decoding outputs to achieve accurate transcriptions as much as possible for
acoustic model training.
3. Wide-domain adaptation.
This topic will also complement and leverage the work on the diverse data structuring in order to develop acoustic transcription models that can fully integrate wide-ranging speech data. MLP features and Deep models will be investigated for the wide-domain adaptation. This topic will also include developing ideas from lightly supervised and unsupervised cross-domain acoustic model adaptation.