2017 NIST OpenSAT Pilot 2017 - SSSF 2017 openSAT Pilot: Sofa Super Store Fire OpenSAT-SSSF ================================================================ Development and Evaluation data for SAD, ASR, and KWS ------------------------------------------- 1.0 DESCRIPTION The Charleston Sofa Super Store fire occurred on the evening of June 18, 2007, in Charleston, South Carolina killing nine firefighters. This dataset includes operational data, released to the general public after the investigation into the firefighters' deaths. This data was obtained from that real-world, operational event offering researchers data that cannot be duplicated through scientific collection. DATA CONTAINS DISTURBING CONTENT (PLEAS FROM TRAPPED FIREFIGHTERS). DATA CONTAINS PHONE NUMBERS (FEW). Originally divided into a 30 minute development dataset, and a 30 minute evaluation dataset, this data was utilized for test and evaluation in the 2017 NIST Public Safety Communications Research and Department of Homeland Security sponsored OpenSAT Pilot evaluation. Methods used in the Pilot OpenSAT Evaluation included speech-activity detection (SAD); keyword search (KWS); automatic, speech recognition (ASR).  More information on OpenSAT evaluations may be found here: https://www.nist.gov/itl/iad/mig/opensat.  2.0 DATA The dataset was created from the audio of, and logs of, radio and telephone dispatches from the Sofa Super Store Fire. These dispatches had previously been made available from the Charleston Sofa Super Store Phase II report (published May 15, 2008) by the City of Charleston and through the FOIA process. The report contains textual transcription of those dispatches. The transcriptions were re-annotated and transformed by NIST into the formats required to provide a reference key for scoring system’s output in the Pilot OpenSAT evaluation. The resulting data set consists of approximately one hour total of audio and transcription from the dispatches for speech activity detection, keyword search, and automated speech recognition.  The recorded audio represents real-world, fire-response, operational data that cannot be duplicated through a controlled scientific experiment or simulation. The data presents multiple challenges for system’s analytics such as land-mobile-radio transmission effects, speaking with significant background noise (Lombard effect), speech under cognitive and physical stress, varying background noise types, varying background decibel levels, and a real-world scenario. 3.0 DATA STRUCTURE The approximately one hour audio is divided into twelve roughly five minute audio files (approximately 4.7 MB each), six for system development and six for system evaluation. Audio files are 16 bit, 8KHz sample rate, NIST SPHERE format converted from mp3. Accompanying reference files are divided by analytic tasks utilized in the OpenSAT Pilot (ASR, KWS, SAD) and include streaming media (.stm), annotation (.txt), Rich Transcription Time Marked (.rttm), Extended Mark Up Language (.xml) and Speech Activity Detection (.sad) files.   Documentation includes README.txt, original transcription (.pdf), and NIST IR 8242 OpenSAT Pilot (.pdf), and list files. DTD is NOT available for xml files. Information on tool downloads available from NIST, as well as, mock speech activity detection files may be found in the tools folder.  General directory structure is as follows: /data      /dev           /SSSF_dev_data_files                   6 files    28,165 KB           /SSSF_dev_reference_files                /sssf_dev_asr_ref                  12 files           51 KB                /sssf_dev_kws_ref                  8 files          146 KB                /sssf_dev_sad_ref                   6 files           29 KB     /eval           /SSSF_eval_data_files                  6 files      29,931 KB           /SSSF_eval_reference_files                /sssf_eval_asr_ref                 12 files           50 KB                /sssf_eval_kws_ref                  3 files          141 KB                /sssf_eval_sad_ref                  6 files           27 KB     /transcription                                 1 file           101 KB /docs                                               4 files       1,343 KB /tools     /ASR                                           2 files          2 KB     /KWS                                           2 files            2 KB -------------------------------------------- For more information and access to the 2017 OpenSAT Pilot Evaluation Plan go to: https://www.nist.gov/itl/iad/mig/nist-2017-pilot-speech-analytic-technologies-evaluation. -------------------------------------------- WARNING: Some communications in this audio may be disturbing to hear. The audio recording was released by the City of Charleston, South Carolina, USA, August 2007. More information about the Charleston Superstore Fire may be found here: https://www.iaff.org/news/13th-anniversary-of-the-charleston-sofa-super-store-fire- remembering-the-charleston-9/. https://www.nist.gov/news-events/news/2010/10/nist-study-charleston-furniture- store-fire-calls-national-safety. ------------------------------------------- Updated by Diane Ridgeway, NIST, May 13, 2021