Home › Language Resources › Data

Santa Barbara Corpus of Spoken American English Part I

Item Name:	Santa Barbara Corpus of Spoken American English Part I
Author(s):	John W. Du Bois, Wallace L. Chafe, Charles Meyer, Sandra A. Thompson
LDC Catalog No.:	LDC2000S85
ISBN:	1-58563-164-7
ISLRN:	407-731-819-668-4
DOI:	https://doi.org/10.35111/s2q7-gq73
Release Date:	January 01, 2000
Member Year(s):	2000
DCMI Type(s):	Sound
Data Source(s):	microphone speech
Language(s):	English
Language ID(s):	eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2000S85 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Du Bois, John W., et al. Santa Barbara Corpus of Spoken American English Part I LDC2000S85. Web Download. Philadelphia: Linguistic Data Consortium, 2000.
Related Works: Hide	View hasContinuation LDC2003S06 Santa Barbara Corpus of Spoken American English Part II LDC2004S10 Santa Barbara Corpus of Spoken American English Part III LDC2005S25 Santa Barbara Corpus of Spoken American English Part IV

Introduction

The Santa Barbara Corpus of Spoken American English is based on hundreds of recordings of natural speech from all over the United States, representing a wide variety of people of different regional origins, ages, occupations, and ethnic and social backgrounds. It reflects many ways that people use language in their lives: conversation, gossip, arguments, on-the-job talk, card games, city council meetings, sales pitches, classroom lectures, political speeches, bedtime stories, sermons, weddings, and more.

Data

Part I contains 14 speech files of between 15-30 minutes each, from the Santa Barbara Corpus of Spoken American English. Collected by: University of California, Santa Barbara Center for the Study of Discourse, Director John W. Du Bois (UCSB), Associate Editors: Wallace L. Chafe (UCSB), Charlese Meyer (UMass, Boston), and Sandra A. Thompson (UCSB). The Santa Barbara Corpus of Spoken American English is part of the International Corpus of English (Charles W. Meyer, Director), representing the American Component.

Each speech file is accompanied by a transcript in which phrases are time stamped with respect to the audio recording. Personal names, place names, phone numbers, etc., in the transcripts have been altered to preserve the anonymity of the speakers and their acquaintances and the audio files have been filtered to make these portions of the recordings unrecognizable.

Samples

For an example of the data in this corpus, please examine these samples of the recordings and transcripts:

Updates

There are no updates at this time.

Santa Barbara Corpus of Spoken American English Part I

Introduction

Data

Samples

Updates

Available Media

View Fees