TRAINS Spoken Dialog Corpus

Item Name: TRAINS Spoken Dialog Corpus
Author(s): James Allen, Peter Heeman
LDC Catalog No.: LDC95S25
ISBN: 1-58563-057-8
ISLRN: 070-132-331-927-5
Member Year(s): 1995
DCMI Type(s): Sound
Sample Type: 1-channel pcm compressed
Sample Rate: 16000
Data Source(s): microphone conversation
Application(s): spoken dialogue systems, speech recognition, discourse analysis
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC95S25 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Allen, James, and Peter Heeman. TRAINS Spoken Dialog Corpus LDC95S25. CD. Philadelphia: Linguistic Data Consortium, 1995.

This release contains a corpus of task-oriented spoken dialogs. These dialogs were collected in 1993 at the University of Rochester Department of Computer Science as part of the TRAINS project, a project to develop a conversationally proficient planning assistant, which helps a user construct a plan to achieve some task involving the manufacturing and shipment of goods in a railroad freight system. The collection procedure was designed to make the setting as close to human-computer interaction as possible, but was not a "wizard" scenario, where one person pretends to be a computer. Thus these dialogs provide a snapshot into an ideal human-computer interface that would be able to engage in fluent conversations.

Altogether, this corpus includes 98 dialogs, collected using 20 different tasks and 34 different speakers. This amounts to six and a half hours of speech, about 5,900 speaker turns and 55,000 transcribed words.

Available Media

View Fees





Login for the applicable fee