You are here

Data

Sheffield Wargames Corpus — natural moving speech all with native English speakers and location tracking data.  96 mic channels including headsets, farfield arrays, spherical array. Licence: free of charge, non-redistribution, to bona fide researchers only. Details are contained in this paper:  SheffieldWargamesCorpus.pdf

homeService Corpus — Through the longitudinal homeService user trial, all interactions with the homeService system is recorded. Over time this will become a unique collection of dysarthric speech from up to 10 users. The data is stored in the NST internal data format and will be made available to all project partners.

English Heritage Data — 

NST BBC data - A large collection of broadcasts from the BBC, with more than 1,500 hours of data and covering a variety of TV and radio programmes. It includes audio/video, subtitles and description metadata. Licence: Only available within the NST project.

AMI — The AMI Meeting Corpus is a multi-modal data set consisting of 100 hours of meeting recordings. Around two-thirds of the data has been elicited using a scenario in which the participants play different roles in a design team, taking a design project from kick-off to completion over the course of a day. The rest consists of naturally occurring meetings in a range of domains. Detailed information can be found here: https://www.idiap.ch/dataset/ami/