DIRHA English WSJ Audio
Item Name: | DIRHA English WSJ Audio |
Author(s): | Mirco Ravanelli, Luca Cristoforetti, Maurizio Omologo |
LDC Catalog No.: | LDC2018S01 |
ISBN: | 1-58563-831-5 |
ISLRN: | 112-363-425-685-7 |
DOI: | https://doi.org/10.35111/2j6c-6z19 |
Release Date: | January 16, 2018 |
Member Year(s): | 2018 |
DCMI Type(s): | Sound, Text |
Sample Type: | pcm |
Sample Rate: | 16000 |
Data Source(s): | microphone speech |
Project(s): | DIRHA |
Application(s): | speech recognition |
Language(s): | English |
Language ID(s): | eng |
License(s): |
DIRHA English WSJ Audio Agreement |
Online Documentation: | LDC2018S01 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Ravanelli, Mirco, Luca Cristoforetti, and Maurizio Omologo. DIRHA English WSJ Audio LDC2018S01. Web Download. Philadelphia: Linguistic Data Consortium, 2018. |
Related Works: | View |
Introduction
DIRHA English WSJ Audio was developed as part of the Distant-Speech Interaction for Robust Home Applications (DIRHA) Project which addressed natural spontaneous speech interaction with distant microphones in a domestic environment. It is comprised of approximately 85 hours of real and simulated read speech by six native American English speakers. The target utterances were taken from CSR-I (WSJ0) Complete (LDC93S6A), specifically, the 5,000 word subset of read speech from Wall Street Journal news text.
This release contains signals of different characteristics in terms of noise and reverberation making it suitable for various multi-microphone signal processing and distant speech recognition tasks. The corpus can be coupled with related Kaldi baselines and tools that are available here.
Data
Speech was collected in a real apartment setting with typical domestic background noise and inter/intra-room reverberation effects. A total of 32 microphones were placed in the living-room (26 microphones) and in the kitchen (6 microphones). The original recordings were made at a sampling frequency of 48 kHz. However, for the sake of compactness, the released signals in this publication are in wav format with 16 kHz sampling frequency and 16 bit resolution.
Annotations for each acoustic sequence are included in xml format, such as microphone positions, speaker id, speaker gender and speaker position. Additional metadata about the speakers and images of the apartment setting are also provided. Consult the documentation accompanying this release for more information about the collection.
Samples
Please view this audio sample and annotation sample.
Updates
None at this time.