TRAINS Spoken Dialog Corpus
|Item Name:||TRAINS Spoken Dialog Corpus|
|Author(s):||James Allen, Peter Heeman|
|LDC Catalog No.:||LDC95S25|
|Sample Type:||1-channel pcm compressed|
|Data Source(s):||microphone conversation|
|Application(s):||spoken dialogue systems, speech recognition, discourse analysis|
LDC User Agreement for Non-Members
|Online Documentation:||LDC95S25 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Allen, James, and Peter Heeman. TRAINS Spoken Dialog Corpus LDC95S25. CD. Philadelphia: Linguistic Data Consortium, 1995.|
This release contains a corpus of task-oriented spoken dialogs. These dialogs were collected in 1993 at the University of Rochester Department of Computer Science as part of the TRAINS project, a project to develop a conversationally proficient planning assistant, which helps a user construct a plan to achieve some task involving the manufacturing and shipment of goods in a railroad freight system. The collection procedure was designed to make the setting as close to human-computer interaction as possible, but was not a "wizard" scenario, where one person pretends to be a computer. Thus these dialogs provide a snapshot into an ideal human-computer interface that would be able to engage in fluent conversations.
Altogether, this corpus includes 98 dialogs, collected using 20 different tasks and 34 different speakers. This amounts to six and a half hours of speech, about 5,900 speaker turns and 55,000 transcribed words.