File: sls_test.doc, created 3/12/93
SLS Principles of Testing
A certain fraction (currently 30%) of the annotated data that
flows through NIST will be sequestered as a pool of potentional
test data. For a particular test, data will be selected from this
pool at random within the bounds of the constraints detailed below.
After the selection, NIST will run software that verifies that the
test data does meet these constraints and prints out a report to
this effect for the record, in addition to the usual software that
checks data quality and format.
- The test data will not include speech from any speaker who
is in the defined training set, which always includes the data
previously used in testing.
- The consistency of .ref (minimal) and .rf2 (maximal) answers will
be ensured by checking that each alternate .ref answer is included
in its corresponding .rf2 answer.
- All utterances in a session will be used as input for the tests,
but for the NL and SLS tests, utterances classed "X" will not be scored.
- Truncation of an utterance may be cause for excluding an entire
session from the test material. If it appears that the computer
system actually heard and processed some speech material that was
truncated from the transcription, so that the continuity of the dialog
is not consistent with the transcriptions, then the session won't be
used. Minor truncations not meeting this criterion will be processed
just like other Class X utterances.
- Time-order consistency will be ensured by checking that the
order of utterences selected for the test by the index file
is the same order in which they were said.
- Homogeneity of training and test data will be ensured by
checking that the fraction of class A, D, and X queries in the
test data is within plus-or-minus 20% of the corresponding
fractions in the training data.
- Cross-site balance of the test data will be ensured by checking
that no site has more than 15% more queries than any other site.