NIST 2006 Spoken Term Detection Development Set LDC2011R30 File: README.txt Date: November 3, 2010 This directory contains the system input files for the Arabic, English and Mandarin 2006 Spoken Term Detection Development Set. The Broadcast News and Conversational Telephone Speech data is licensed through the Linguistic Data Consortium and the AMI meeting data is licensed through the AMI project. The LDC license text can be found in the 'licenses/STD_2006_Eval_Agreement-v2.pdf' file and the AMI license can be found at corpus.amiproject.com. The evaluation task specification, directory structure explanation, and file format definitions can be found in Appendix A of the STD Evaluation Plan 'doc/std06-evalplan-v10.pdf' and the errata 'doc/std06-evalplan-v10-errata-v2.pdf'. Appendix A describes the overall structure of the data resources. The following specific resources are supplied in this release. 1. System Input Experimental Control Files (ECF): The system input ECF files are located in the 'indices' directory. These files define the full extent of the excerpts to be processed by the system. They are the same ECF files as used by participants in the 2006 evaluation. Users wanting to replicate the 2006 evaluation should use these files as system inputs. There is a separate file for each language: Arabic: expt_06_std_dev06_arab_all_spch_expt_1.ecf.xml English: expt_06_std_dev06_eng_all_spch_expt_1.ecf.xml Mandarin: expt_06_std_dev06_mand_all_spch_expt_1.ecf.xml 2. Scoring Input Experimental Control Files (ECF): The scoring ECF files are located in the 'indices' directory. These ECF files define the extent of scorable material in the test data. Unlike the system input ECF file, the scoring ECF contains excerpts defining the evaluable regions of the recordings. These files should not be used in any way by the system as the scoring ECF file was built by extracting information from the human annotations. There is a separate file for each language: Arabic: expt_06_std_dev06_arab_all_spch_expt_1.scoring.ecf.xml English: expt_06_std_dev06_eng_all_spch_expt_1.scoring.ecf.xml Mandarin: expt_06_std_dev06_mand_all_spch_expt_1.scoring.ecf.xml 3. System Input Term Lists: The system input term lists are located in the 'indices' directory. These files define just the terms a system must search for. No other information about the terms is provide to the system. These files were used by evaluation participants in 2006. Users wanting to replicate the 2006 evaluation should use these files as system inputs. The following term lists are provided: Arabic: expt_06_std_dev06_arab_all_spch_expt_2.tlist.xml English: expt_06_std_dev06_eng_all_spch_expt_1.tlist.xml Mandarin: expt_06_std_dev06_mand_all_spch_expt_1.tlist.xml