1997 English Broadcast News Speech (HUB4)

Item Name: 1997 English Broadcast News Speech (HUB4)
Author(s): Jonathan Fiscus, John Garofolo, Mark Przybocki, William Fisher, David Pallett
LDC Catalog No.: LDC98S71
ISBN: 1-58563-123-X
ISLRN: 331-835-398-589-3
Member Year(s): 1998
DCMI Type(s): Sound
Sample Type: 1-channel pcm
Sample Rate: 16000
Data Source(s): broadcast news
Project(s): Hub4, GALE, EARS
Application(s): speech recognition
Language(s): English
Language ID(s): eng
Online Documentation: LDC98S71 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Fiscus, Jonathan, et al. 1997 English Broadcast News Speech (HUB4) LDC98S71. Web Download. Philadelphia: Linguistic Data Consortium, 1998.
LDC98S71 - Speech data LDC98T28 - Transcripts

Introduction

This set of 3 DVD-ROMs contains a total of 97 hours of recordings from radio and television news broadcasts, gathered between June 1997 and February 1998. It has been prepared to serve as a supplement to the 1996 Broadcast News Speech collection (consisting of over 100 hours of similar recordings). The primary motivation for this collection is to provide additional training data for the DARPA "HUB4" Project on continuous speech recognition in the broadcast domain.

Data

Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers. The transcription conventions are described in the file "transcrp.doc" -- please note that this file describes the transcription methods by reference to text formatting conventions used internally by the LDC during the transcription process. The released version of the transcripts is in SGML format, comparable to the format that was used in the 1996 Broadcast News Speech transcriptions and there is accompanying documentation and an SGML DTD file, included with the transcription release.

Updates

There are no updates at this time.

Pricing

The Reduced Licensing Fee for this corpus is US$600.

Available Media

View Fees

Member
Non-Member
Reduced-License
Extra Copy
Login for the applicable fee