CSR 1995 Hub-4 Marketplace Training Epoch Data NIST Speech Disc 26-1.1 March, 1996 This disc contains training epoch data for the ARPA CSR 1995 Hub-4 tests. This data consists of 10 complete Marketplace radio broadcasts, each of which are approximately 29 minutes long. The broadcasts were chosen so as to evenly represent different days of the week across a training epoch (931112 - 940331). The epoch was chosen so as to match the training epoch in the 1994 CSRNAB1 data. The recorded waveforms are stored in SPHERE-headered, files. The directory and file structure on this CD-ROM is as follows: csr95/h4/train// Each broadcast date directory contains a waveform file ".wav" for the broadcast as well as a transcription file ".txt" and a speaker information file ".spk". The following broadcasts are included in this training epoch set: Date Day-of-week ------ ----------- 931117 Wednesday 931210 Friday 931230 Thursday 940112 Wednesday 940124 Monday 940204 Friday 940215 Tuesday 940303 Thursday 940315 Tuesday 940328 Monday The file, "csr95/h4/doc/transpec.doc", contains a description of the transcription convention used for this data. Note that the transcription (.txt) files and speaker info (.spk) files in the Hub-4 training data have not received the scrutiny that the Hub-4 Development Test and Evaluation Test data have and may contain some transcription errors. Note also that unlike the test data transcriptions, the training data transcriptions have only been time-marked at the story level.