Home › Language Resources › Data

1997 English Broadcast News Transcripts (HUB4)

Item Name:	1997 English Broadcast News Transcripts (HUB4)
Author(s):	Jennifer Alabiso, Robert MacIntyre, David Graff
LDC Catalog No.:	LDC98T28
ISBN:	1-58563-124-8
ISLRN:	789-160-485-831-6
DOI:	https://doi.org/10.35111/m20q-1s68
Member Year(s):	1998
DCMI Type(s):	Text
Data Source(s):	broadcast news
Project(s):	Hub4, GALE, EARS
Application(s):	speech recognition
Language(s):	English
Language ID(s):	eng
Online Documentation:	LDC98T28 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Alabiso, Jennifer, Robert MacIntyre, and David Graff. 1997 English Broadcast News Transcripts (HUB4) LDC98T28. Web Download. Philadelphia: Linguistic Data Consortium, 1998.
Related Works: Hide	View isAnnotationOf LDC98S71 1997 English Broadcast News Speech (HUB4) hasAnnotation LDC2003T11 ACE-2 Version 1.0 hasOutcome LDC2011T06 Broadcast News Lattices isContinuationOf LDC98T24 1997 Mandarin Broadcast News Transcripts (HUB4-NE) hasContinuation LDC98T29 1997 Spanish Broadcast News Transcripts (HUB4-NE) LDC2001S91 1997 HUB4 Broadcast News Evaluation Non-English Test Material LDC2002S11 1997 HUB4 English Evaluation Speech and Transcripts isSimilarWith LDC97T22 1996 English Broadcast News Transcripts (HUB4) LDC99T36 USC Marketplace Broadcast News Transcripts LDC2000S86 1998 HUB4 Broadcast News Evaluation English Test Material LDC2000S88 1999 HUB4 Broadcast News Evaluation English Test Material

LDC98S71 - Speech data LDC98T28 - Transcripts

Introduction

This publication has been prepared to serve as a supplement to the 1996 Broadcast News Speech collection (consisting of over 100 hours of similar recordings). The primary motivation for this collection is to provide additional training data for the DARPA "HUB4" Project on continuous speech recognition in the broadcast domain.

Data

This set of 18 CD-ROMs contains a total of 97 hours of recordings from radio and television news broadcasts, gathered between June 1997 and February 1998.

Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers. The transcription conventions are described in the file "transcrp.doc" -- please note that this file describes the transcription methods by reference to text formatting conventions used internally by the LDC during the transcription process. The released version of the transcripts is in SGML format, comparable to the format that was used in the 1996 Broadcast News Speech transcriptions and there is accompanying documentation and an SGML DTD file, included with the transcription release.

Updates

There are no updates at this time.

Additional Licensing Instructions

This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.

1997 English Broadcast News Transcripts (HUB4)

Introduction

Data

Updates

Additional Licensing Instructions

Available Media

View Fees