The Walking Around Corpus
Item Name: | The Walking Around Corpus |
Author(s): | Susan E. Brennan, Katharina S. Schuhmann, Karla M. Batres |
LDC Catalog No.: | LDC2015S08 |
ISBN: | 1-58563-722-X |
DOI: | https://doi.org/10.35111/qpdc-0x63 |
Release Date: | July 15, 2015 |
Member Year(s): | 2015 |
DCMI Type(s): | Sound, Text |
Sample Type: | flac |
Sample Rate: | 8000 |
Data Source(s): | field recordings, telephone conversations |
Application(s): | speech recognition, speech activity detection |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2015S08 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Brennan, Susan E., Katharina Schuhmann, and Karla Batres. The Walking Around Corpus LDC2015S08. Web Download. Philadelphia: Linguistic Data Consortium, 2015. |
Related Works: | View |
Introduction
The Walking Around Corpus was developed by Stony Brook University and is comprised of approximately 33 hours of navigational telephone dialogues from 72 speakers (36 speaker pairs). Participants were Stony Brook University students who identified themselves as native English speakers.
This corpus was elicited using a navigation task in which one person directed another to walk to 18 unique destinations on Stony Brook University’s West campus. The direction-giver remained inside the lab and gave directions on a landline telephone to the pedestrian who used a mobile phone. As they visited each location, the pedestrians took a picture of each of the 18 destinations using the mobile phone. Pairs conversed spontaneously as they completed the task. The pedestrians' locations were tracked using their cell phones' GPS systems. The pedestrians did not have any maps or pictures of the target destinations and therefore relied on the direction-giver's verbal directions and descriptions to locate and photograph the target destinations.
Data
The conversations were recorded by means of a Public Switched Telephone Network (PSTN) conferencing service. Due to the nature of the task, the recordings contain occasional background noise.
Each digital audio file was transcribed with time stamps. Most of the recordings were first transcribed by a transcription company and then edited and checked by a trained graduate student. All other transcripts were transcribed by trained students at Stony Brook University. The corpus material also includes the visual materials (pictures and maps) used to elicit the dialogues, data about the speakers' relationship, spatial abilities and memory performance, and other information.
All audio is presented as 8000Hz, 16-bit flac compressed wav. Note the data was converted from wav, so some documentation may still indicate wav. Transcripts are presented as xls spreadsheets.
Samples
Please listen to this audio sample.
Updates
None at this time.