This file contains documentation on CSLU: Stories V1.2, Linguistic Data Consortium (LDC) catalog number LDC2006S14 and ISBN 1-58563-366-6.
CSLU: Stories contains extemporaneous speech collected from English speakers in the CSLU Multilanguage Telephone Speech data collection. Each speaker was asked to speak on a topic of his or her choice for one minute. Those utterances are collected in the Stories corpus.
The Stories corpus comprises:
- Speech files for the 702 calls
- Time-aligned word level transcriptions (and corresponding comment files) for approximately 322 stories
- Word transcriptions (not time aligned) for 702 stories
- Time-aligned phonetic labels for 702 stories
For an example of the data in this corpus, please listen to this audio sample.
Portions © 2002 Center for Spoken Language Understanding Oregon Health and Science University, © 2006 Trustees of the University of Pennsylvania