Item Name: CSR-IV HUB3
Author(s): Jonathan G. Fiscus, John S. Garofolo, David Pallett
LDC Catalog No.: LDC96S33
ISBN: 1-58563-086-1
ISLRN: 529-082-231-699-3
Member Year(s): 1996
DCMI Type(s): Sound
Sample Type: 1-channel pcm
Sample Rate: 16000
Data Source(s): microphone speech
Project(s): DARPA-CSR
Application(s): speech recognition
Language(s): English
Language ID(s): eng
License(s): CSR IV Hub 3 Agreement
Online Documentation: LDC96S33 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Fiscus, Jonathan G., John Garofolo, and David Pallett. CSR-IV HUB3 LDC96S33. DVD. Philadelphia: Linguistic Data Consortium, 1996.
This set of CD-ROMs contains all of the speech data provided to sites participating in the DARPA CSR November 1995 HUB3 Multi-Microphone tests. The data consists of digitized waveforms collected with eight different microphones simultaneously from 40 subjects reading 15 sentence articles drawn from various North American business news publications. The data is partitioned into development-test and evaluation-test sets. The test sets were collected with different subjects, prompts and microphones. No training data was collected for this corpus since a substantial amount of NAB acoustic training data was already available. Index files have been included that specify the exact subset of the evaluation test recordings which were used in the November 1995 tests. The software NIST used to process and score the output of the tests systems is also included.

The data is organized as follows:

CD26-3 Development-Test Data-Location 1, Adaptation and NAB recordings, Subjects:703-705, 707-70a, 70c, 70f, 70g

CD26-4 Development-Test Data-Location 2, NAB recordings, Subjects:70k, 70m, 70o, 70q-70s, 70u-70w

CD26-5 Development-Test Data-Location 2, Adaptation recordings, Subjects:70k 70m-70o, 70q-70s, 70u-70w

CD26-3 Development-Test Data-NAB recordings, Subjects:710-71j

As of September, 2007 this publication has been condensed to fit on a single DVD. The data on each CD resides in its own directory labeled with the above NIST labels.


The Reduced Licensing Fee for this corpus is US$200.

