USC Marketplace Broadcast News Transcripts


Item Name: USC Marketplace Broadcast News Transcripts
Authors: Alexandra Canavan and David Miller
LDC Catalog No.: LDC99T36
ISBN: 1-58563-151-5
Data Type: text
Data Source(s): broadcast news
Application(s): speech recognition
Language(s): English
Language ID(s): ENG
Distribution: Web Download
Member fee: $0 for 1999 members
Non-member Fee: US $1200.00
Reduced-License Fee: US $600.00
Extra-Copy Fee: N/A
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Alexandra Canavan and David Miller
1999
USC Marketplace Broadcast News Transcripts
Linguistic Data Consortium, Philadelphia

Introduction

The USC Marketplace Broadcast News Corpus contains approximately 40 hours of audio data, which was recorded daily between May 1, 1996 and September 18, 1996. Corresponding transcript files were created by Federal Document Clearing House and enhanced by the LDC to include: story boundaries, disfluency markers, and speaker and gender identification. In keeping with HUB4 style transcription conventions, LDC spelled all digit strings in standard orthography. Commercial and music segments, while a part of the audio publication, were excluded from the transcripts. The timestamps mark the beginning of each speaker turn relative to the beginning of the recording and are precise to the 100th of a second. Although the transcripts were created using HUB4 conventions, the second and third pass quality checks, typically required by government sponsored evaluation projects, were skipped.

Data

The USC Marketplace recordings from the summer of 1996 were received on digital audio tapes (DATs) from the University of Southern California. LDC excluded from this set the roughly seven hours of broadcast that are currently included in the 1996 English Broadcast News publication.

Marketplace is produced by USC Radio in Los Angeles, a division of the University of Southern California.

Updates

There are no updates at this time.

Copyright