This file contains documentation for Voice of America (VOA) Czech Broadcast News Audio, Linguistic Data Consortium (LDC) catalog number LDC2000S89 and ISBN 1-58563-179-5. Included below as reference material is the documentation for Voice of America (VOA) Czech Broadcast News Transcripts, LDC2000T53 and ISBN 1-58563-180-9.
Between February 9 and May 28, 1999, LDC collected approximately 30 hours of Czech broadcast audio from the Voice of America news service. The 62 data files presented in this corpus represent the audio of the daily broadcasts of 30-minute news programs.
Due to technical limitations in the hardware at LDC that was used to receive the VOA broadcasts via a satellite downlink, a number of files contain brief portions where the audio signal was interrupted. These interruptions typically yielded regions of complete silence that lasted less than two seconds and were scattered sparsely throughout an affected audio file. Additional markup was provided in the transcription texts to isolate the regions where these interruptions occurred.
The 62 audio files in this corpus are single-channel, 16 KHz, 16-bit linear SPHERE files.
For an example of the data in this corpus, please review this audio sample.
Updates There are no updates at this time.
Copyright Portions © 2000 Trustees of the University of Pennsylvania