1997 HUB4 Broadcast News Evaluation Non-English Test Material


Item Name: 1997 HUB4 Broadcast News Evaluation Non-English Test Material
Authors: Jonathan Fiscus, John Garofolo, Mark Przybocki, William Fisher, and David Pallett
LDC Catalog No.: LDC2001S91
ISBN: 1-58563-182-5
Data Type: speech
Data Source(s): broadcast news
Project(s): EARS, GALE, Hub4
Application(s): speech recognition
Language(s): Mandarin Chinese, Spanish
Language ID(s): cmn, spa
Distribution: 1 CD
Member fee: $0 for 2001 members
Non-member Fee: N/A (Members Only)
Reduced-License Fee: N/A
Extra-Copy Fee: US $150.00
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Jonathan Fiscus, et al.
2001
1997 HUB4 Broadcast News Evaluation Non-English Test Material
Linguistic Data Consortium, Philadelphia

Introduction

This publication contains the evaluation test material used in the 1997 DARPA/NIST Continuous Speech Recognition Broadcast News HUB4 Non-English Benchmark Test administered by the NIST Spoken Natural Language Processing Group and produced by the Linguistic Data Consortium (LDC), catalog number LDC2000S91, ISBN 1-58563-182-5.

Data

The test material is contained in two SPHERE-formatted waveform files. The file h4ne97sp.sph (set1) contains one hour of Spanish Broadcast News excerpts from 1997. The file h4ne97ma.sph (set2) contains one hour of Mandarin Broadcast News excerpts from 1997. Each file should be separately recognized per the HUB4 Non English Evaluation Specification.

Note: This publication does not contain the material for the HUB4 English evaluation. It will be released separately.

Updates

There are no updates at this time.

Copyright

Portions Copyright 1997, CCTV People's Republic of China TV Portions Copyright 1997, KAZN-AM Portions Copyright 1997, Televisa Portions Copyright 1997, Univision

Pricing

The Reduced Licensing Fee for this corpus is US$150.