2017 NIST OpenSAT Pilot - SSSF
Item Name: | 2017 NIST OpenSAT Pilot - SSSF |
Author(s): | Frederick Byers |
LDC Catalog No.: | LDC2022S01 |
ISBN: | 1-58563-983-4 |
ISLRN: | 847-094-281-048-4 |
DOI: | https://doi.org/10.35111/4fw7-wy71 |
Release Date: | January 18, 2022 |
Member Year(s): | 2022 |
DCMI Type(s): | Sound, Text |
Sample Type: | pcm |
Sample Rate: | 8000 |
Data Source(s): | field recordings, microphone conversation, telephone conversations, transcribed speech |
Project(s): | NIST OpenSAT |
Application(s): | speech activity detection, speech recognition, spoken term detection |
Language(s): | English |
Language ID(s): | eng |
License(s): |
LDC User Agreement for Non-Members |
Online Documentation: | LDC2022S01 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Byers, Frederick. 2017 NIST OpenSAT Pilot - SSSF LDC2022S01. Web Download. Philadelphia: Linguistic Data Consortium, 2022. |
Related Works: | View |
Introduction
2017 NIST OpenSAT Pilot - SSSF was developed by NIST (National Institute of Standards and Technology) and contains approximately one hour of operational speech data, transcripts and annotation files used in the speech activity detection, automatic speech recognition (ASR), and keyword search (KWS) tasks of the 2017 OpenSAT Pilot evaluation. The source audio consists of radio and telephone dispatches during the Sofa Super Store fire (Charleston, South Carolina) in June 2007 (SSSF), which claimed the lives of nine firefighters. These recordings contain content that some may find disturbing.
The NIST Open Speech Analytic Technologies (OpenSAT) Evaluation Series was designed to bring together researchers developing different types of technologies to address speech analytic challenges present in some of the most difficult acoustic conditions with the end goal of improving the state-of-the-art through objective, large-scale common evaluations. The 2017 pilot focused on the public safety communications domain. The SSSF audio represents real-world, fire response, operational data with multiple challenges for system analytics, such as land-mobile-radio transmission effects, significant background noise, speech under stress and variable decibel levels. See the OpenSAT website for more information.
Data
This dataset was created from the audio and logs of SSSF radio and telephone dispatches and transcripts of those dispatches. The transcripts were re-annotated and transformed by NIST into the formats required to provide a reference key for scoring system output in the pilot OpenSAT evaluation.
The data is divided into a 30-minute development set and a 30-minute evaluation set. Audio is presented as 16 bit, 8kHz, NIST SPHERE format files. Accompanying reference files are divided by analytic tasks utilized in the OpenSAT Pilot and are UTF-8 encoded text or XML files. ASR and KWS scoring tools are also included.
Samples
Please view the following samples:
Updates
None at this time.