File: sls_test.doc, created 3/12/93

SLS Principles of Testing

A certain fraction (currently 30%) of the annotated data that flows through NIST will be sequestered as a pool of potentional test data. For a particular test, data will be selected from this pool at random within the bounds of the constraints detailed below. After the selection, NIST will run software that verifies that the test data does meet these constraints and prints out a report to this effect for the record, in addition to the usual software that checks data quality and format.

  1. The test data will not include speech from any speaker who is in the defined training set, which always includes the data previously used in testing.
  2. The consistency of .ref (minimal) and .rf2 (maximal) answers will be ensured by checking that each alternate .ref answer is included in its corresponding .rf2 answer.
  3. All utterances in a session will be used as input for the tests, but for the NL and SLS tests, utterances classed "X" will not be scored.
  4. Truncation of an utterance may be cause for excluding an entire session from the test material. If it appears that the computer system actually heard and processed some speech material that was truncated from the transcription, so that the continuity of the dialog is not consistent with the transcriptions, then the session won't be used. Minor truncations not meeting this criterion will be processed just like other Class X utterances.
  5. Time-order consistency will be ensured by checking that the order of utterences selected for the test by the index file is the same order in which they were said.
  6. Homogeneity of training and test data will be ensured by checking that the fraction of class A, D, and X queries in the test data is within plus-or-minus 20% of the corresponding fractions in the training data.
  7. Cross-site balance of the test data will be ensured by checking that no site has more than 15% more queries than any other site.