Stories Corpus
                            Release 1.2

              Center for Spoken Language Understanding


UPDATED: 22 September 2002

Use of this corpus is permitted only under the conditions of the signed
license agreement. Use or redistribution of this corpus outside the 
agreement is prohibited by law.

Overview
--------
The Stories Corpus is made up of extemporaneous speech collected from English 
speakers in the CSLU Multi-language Telephone Speech data collection. Each speaker 
was asked to speak on a topic of their choice for one minute. These utterances are 
make up the Stories Corpus.

The Stories Corpus comprises:

1. Speech files for the 702 calls (found in /speech directory)

2. Time-aligned word level transcriptions (and corresponding comment
   files) for approximately 322 stories (found in the /labels/*/*.wrd directory)

3. Word transcriptions (not time aligned) for 702 stories (found in 
   the /trans/ directory)

4. Time-aligned phonetic labels for 210 stories (found in the /labels/*/*.ptlola 
   directory) and 702 with extention of phn as product of automatic force alignment 

Distribution Directory Structure
--------------------------------
This is the distribution for Release 1.1 of the Stories Corpus.  This
corpus is distributed by the Center for Spoken Language Understanding
of the Oregon Health & Science University.  Following is a description of the
directory structure in this release:


  readme.txt	General information regarding the corpus.

  docs/		The documentation directory. This directory
		contains further documentation for the Stories
		corpus.

  labels/	Phonetic labeling directory. This directory
		contains phonetic labeling information for this 
		corpus.

  misc/		Miscellaneous directory, possibly containing
		software tools, comments and scripts.

  speech/	The speech directory contains the actual 
		.wav files. There are many numbered
		subdirectories within the speech directory.

  trans/	The transcriptions directory. This directory
		contains the word transcription for each of 
		the speech files.

This corpus requires approximately 531MB of disk space. Please see
the /docs directory for further documentation.

Contact Information
-------------------
Further information about this corpus can be found our web site:
<http://www.cslu.ogi.edu>.

Refer specific questions to:

- Alena Tkacova
- Linguistic Data Services Manager
- Center for Spoken Language Understanding
- Oregon Health & Science University
- email   : alca@asp.ogi.edu
- Phone   : 503 748-1600    
- FAX     : 503 748-7038
- Address : 20000 NW Walker Road
            Beaverton, OR 97006 USA

Constructive feedback about this corpus is appreciated.