Home › Language Resources › Data

The Walking Around Corpus

Item Name:	The Walking Around Corpus
Author(s):	Susan E. Brennan, Katharina S. Schuhmann, Karla M. Batres
LDC Catalog No.:	LDC2015S08
ISBN:	1-58563-722-X
DOI:	https://doi.org/10.35111/qpdc-0x63
Release Date:	July 15, 2015
Member Year(s):	2015
DCMI Type(s):	Sound, Text
Sample Type:	flac
Sample Rate:	8000
Data Source(s):	field recordings, telephone conversations
Application(s):	speech recognition, speech activity detection
Language(s):	English
Language ID(s):	eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2015S08 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Brennan, Susan E., Katharina Schuhmann, and Karla Batres. The Walking Around Corpus LDC2015S08. Web Download. Philadelphia: Linguistic Data Consortium, 2015.
Related Works: Hide	View isSimilarWith LDC93S11 Road Rally LDC93S12 HCRC Map Task Corpus relatesTo LDC2017S16 LDC Spoken Language Sampler - Fourth Release

Introduction

The Walking Around Corpus was developed by Stony Brook University and is comprised of approximately 33 hours of navigational telephone dialogues from 72 speakers (36 speaker pairs). Participants were Stony Brook University students who identified themselves as native English speakers.

This corpus was elicited using a navigation task in which one person directed another to walk to 18 unique destinations on Stony Brook University’s West campus. The direction-giver remained inside the lab and gave directions on a landline telephone to the pedestrian who used a mobile phone. As they visited each location, the pedestrians took a picture of each of the 18 destinations using the mobile phone. Pairs conversed spontaneously as they completed the task. The pedestrians' locations were tracked using their cell phones' GPS systems. The pedestrians did not have any maps or pictures of the target destinations and therefore relied on the direction-giver's verbal directions and descriptions to locate and photograph the target destinations.

Data

The conversations were recorded by means of a Public Switched Telephone Network (PSTN) conferencing service. Due to the nature of the task, the recordings contain occasional background noise.

Each digital audio file was transcribed with time stamps. Most of the recordings were first transcribed by a transcription company and then edited and checked by a trained graduate student. All other transcripts were transcribed by trained students at Stony Brook University. The corpus material also includes the visual materials (pictures and maps) used to elicit the dialogues, data about the speakers' relationship, spatial abilities and memory performance, and other information.

All audio is presented as 8000Hz, 16-bit flac compressed wav. Note the data was converted from wav, so some documentation may still indicate wav. Transcripts are presented as xls spreadsheets.

The Walking Around Corpus

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees