DCIEM/HCRC
Item Name: | DCIEM/HCRC |
Author(s): | Martin Taylor, Ellen Gurman Bard, Cathy Sotillo, David McKelvie, Anne Anderson |
LDC Catalog No.: | LDC96S38 |
ISBN: | 1-58563-089-6 |
ISLRN: | 139-466-600-760-1 |
DOI: | https://doi.org/10.35111/4540-j072 |
Member Year(s): | 1996 |
DCMI Type(s): | Sound, Text |
Sample Type: | 2-channel pcm |
Sample Rate: | 20000 |
Data Source(s): | microphone speech |
Application(s): | speech recognition |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC96S38 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Taylor, Martin, et al. DCIEM/HCRC LDC96S38. Web Download. Philadelphia: Linguistic Data Consortium, 1996. |
Related Works: | View |
Introduction
DCIEM/HCRC was developed by the Defence and Civil Institute of Environmental Medicine in Canada and the Human Communication Research Centre at the University of Edinburgh and the University of Glasgow. It contains approximately 23 hours of English speech data along with corresponding transcripts from 36 participants, 34 male and 2 female. This release contains the materials used to collect all 216 spoken dialogues digital audio, orthographic transcriptions, documentation and source code for tools. The dialogues were selected to provide balanced representation at different points in a sleep deprivation experiment.
Data
The top-level directory contains the following files:
- 0dir.txt: A complete listing of all files, giving the CD on which each can be found.
- 0direye.txt: A complete listing of all dialogues, giving the CD on which each can be found, in a form more convenient for visual scanning.
- read.me: A readme file, with the part and CD number changing from one CD to the next.
The top-level directory contains the following directories:
- doc/ ASCII and/or PostScript(TM) versions of various documents on the corpus: START HERE
- lib/ Resources for included tools
- trn_all/ All the transcripts
- etc/ Information about participants and maps
- src/ UNIX(TM) scripts and C sources for useful tools, emacs interface, world wide web interface and a Microsoft Windows(tm) sound playing program.
In addition to the common directories, each also contains
- run1/
- run2/
Any run/ directory contains sampled audio, transcripts, and maps for one of the six runs of the sleep deprivation experiment.
Each conversation directory has the following files:
- NIST header (.nst)
- sampled speech (.ses)
- annotated orthographic transcription(.trn)
- giver's map (.gmp)
- follower's map (.fmp)
- TEI entry-point (.sgm)
Audio data is presented as 2-channel, 16-bit, 20 kHz ses files. Metadata including participant age, gender, and birthplace are included. The materials have been designed to be easily accessible to users with different equipment and a variety of needs from those who merely wish to generate hardcopies of the orthographic transcriptions to those who require computational analyses of the speech material. All the text files (transcriptions and documentation) should be readable and printable via most systems. The maps are intended for printing via POSTSCRIPT printers and the speech files are provided with human-readable standard headers, enabling them to be played by a wide range of environments for processing sampled speech.
Samples
Please view this speech sample and transcript sample.
Updates
There are no updates at this time.