WSJCAM0 Cambridge Read News
|Item Name:||WSJCAM0 Cambridge Read News|
|Author(s):||Tony Robinson, Jeroen Fransen, David Pye, Jonathan Foote, Steve Renals, Phil Woodland, Steve Young|
|LDC Catalog No.:||LDC95S24|
|Sample Type:||1-channel pcm compressed|
|Data Source(s):||microphone speech|
LDC User Agreement for Non-Members
|Online Documentation:||LDC95S24 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Robinson, Tony, et al. WSJCAM0 Cambridge Read News LDC95S24. Web Download. Philadelphia: Linguistic Data Consortium, 1995.|
This release of WSJCA0 on CD-ROM represents version 1.1 of the corpus, which was initially released on tape by Cambridge University as of August 31, 1994. This collection is modelled directly on the initial ARPA CSR Corpus (WSJ0, a fifteen-disc corpus released by LDC in 1993): it uses the same dual-microphone recording paradigm and a subset of prompting texts drawn from the Wall Street Journal.
There are two key differences between WSJ0 and WSJCAM0: (1) the subjects in WSJCAM0 are native speakers of British English and (2) in addition to standard orthographic transcripts, WSJCAM0 also has information on the time alignment between the sampled waveform and both the words and the phonetic segments.
The CD-ROM publication consists of six discs, with contents organized as follows:
- Discs 1 and 2 - training data from head-mounted microphone
- Disc 3 - development test data from head-mounted microphone, plus first set of evaluation test data
- Discs 4 and 5 - training data from desk-mounted microphone
- Disc 6 - development test data from desk-mounted microphone, plus second set of evaluation test data
Within the train and test sets, speech data are organized by speaker prompting texts, detailed transcriptions and speaker information are included in each speaker directory.
All waveform files have NIST SPHERE headers waveform data are compressed using the Shorten algorithm developed by Tony Robinson at Cambridge University, as adapted for use in the NIST SPHERE software package. (This package is available via anonymous ftp from NIST, on ftp server jaguar.ncsl.nist.gov in the pub directory). Complete documentation is provided on each disc in the set.