1996 English Broadcast News Transcripts (HUB4)
Item Name: | 1996 English Broadcast News Transcripts (HUB4) |
Author(s): | David Graff, Jennifer Alabiso |
LDC Catalog No.: | LDC97T22 |
ISBN: | 1-58563-149-3 |
ISLRN: | 444-268-955-648-0 |
DOI: | https://doi.org/10.35111/339y-6n93 |
Member Year(s): | 1997, 1998 |
DCMI Type(s): | Text |
Data Source(s): | broadcast news |
Project(s): | Hub4, GALE, EARS |
Application(s): | speech recognition |
Language(s): | English |
Language ID(s): | eng |
License(s): |
NPR and USC Archive User Agreement |
Online Documentation: | LDC97T22 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Graff, David, and Jennifer Alabiso. 1996 English Broadcast News Transcripts (HUB4) LDC97T22. Web Download. Philadelphia: Linguistic Data Consortium, 1997. |
Related Works: | View |
Introduction
The 1996 Broadcast News Speech Corpus contains a total of 104 hours of broadcasts from ABC, CNN, and CSPAN television networks and NPR and PRI radio networks with corresponding transcripts. The primary motivation for this collection is to provide training data for the DARPA "HUB4" Project on continuous speech recognition in the broadcast domain. The speech files are available in a 19 disc training data set with one additional disc of development data and an additional disc of evaluation data. The following programs are represented in this corpus:
Data
Transcripts have been made of all recordings in this publication, manually time aligned to the phrasal level, annotated to identify boundaries between news stories, speaker turn boundaries and gender information about the speakers. The released version of the transcripts is in SGML format and there is accompanying documentation and an SGML DTD file, included with the transcription release. The transcripts are available via FTP.
Updates
There are no updates at this time.Samples
Pricing
The Reduced Licensing Fee for this corpus is US$100.