DCIEM/HCRC

Item Name: DCIEM/HCRC
Author(s): Martin Taylor, Ellen Gurman Bard, Cathy Sotillo, David McKelvie, Anne Anderson
LDC Catalog No.: LDC96S38
ISBN: 1-58563-089-6
ISLRN: 139-466-600-760-1
DOI: https://doi.org/10.35111/4540-j072
Member Year(s): 1996
DCMI Type(s): Sound, Text
Sample Type: 2-channel pcm
Sample Rate: 20000
Data Source(s): microphone speech
Application(s): speech recognition
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC96S38 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Taylor, Martin, et al. DCIEM/HCRC LDC96S38. Web Download. Philadelphia: Linguistic Data Consortium, 1996.
Related Works: View

Introduction

DCIEM/HCRC was developed by the Defence and Civil Institute of Environmental Medicine in Canada and the Human Communication Research Centre at the University of Edinburgh and the University of Glasgow. It contains approximately 23 hours of English speech data along with corresponding transcripts from 36 participants, 34 male and 2 female. This release contains the materials used to collect all 216 spoken dialogues digital audio, orthographic transcriptions, documentation and source code for tools. The dialogues were selected to provide balanced representation at different points in a sleep deprivation experiment.

Data

The top-level directory contains the following files:

  • 0dir.txt: A complete listing of all files, giving the CD on which each can be found.
  • 0direye.txt: A complete listing of all dialogues, giving the CD on which each can be found, in a form more convenient for visual scanning.
  • read.me: A readme file, with the part and CD number changing from one CD to the next.

The top-level directory contains the following directories:

  • doc/ ASCII and/or PostScript(TM) versions of various documents on the corpus: START HERE
  • lib/ Resources for included tools
  • trn_all/ All the transcripts
  • etc/ Information about participants and maps
  • src/ UNIX(TM) scripts and C sources for useful tools, emacs interface, world wide web interface and a Microsoft Windows(tm) sound playing program.

In addition to the common directories, each also contains

  • run1/
  • run2/

Any run/ directory contains sampled audio, transcripts, and maps for one of the six runs of the sleep deprivation experiment.

Each conversation directory has the following files:

  • NIST header (.nst)
  • sampled speech (.ses)
  • annotated orthographic transcription(.trn)
  • giver's map (.gmp)
  • follower's map (.fmp)
  • TEI entry-point (.sgm)

Audio data is presented as 2-channel, 16-bit, 20 kHz ses files. Metadata including participant age, gender, and birthplace are included. The materials have been designed to be easily accessible to users with different equipment and a variety of needs from those who merely wish to generate hardcopies of the orthographic transcriptions to those who require computational analyses of the speech material. All the text files (transcriptions and documentation) should be readable and printable via most systems. The maps are intended for printing via POSTSCRIPT printers and the speech files are provided with human-readable standard headers, enabling them to be played by a wide range of environments for processing sampled speech.

Samples

Please view this speech sample and transcript sample.

Updates

There are no updates at this time.

Available Media

View Fees





Login for the applicable fee