NIST Open MT 2008 Current Test Sub-Set (reference translations + system translations) ===================================================================================== Release date: March 19, 2009 1. Introduction This package contains the reference translations and system translations of a subset of the NIST Open MT 2008 evaluation current test set. All XML files included are well-formed and validate against the following DTD: ftp://jaguar.ncsl.nist.gov/mt/resources/mteval-xml-v1.3.dtd 2. Directory structure Four language pairs are included, each in a separate directory: - arabic_to_english - chinese_to_english - english_to_chinese - urdu_to_english Each directory contains one reference file: - references.xml which contains four reference translations. Each directory contains several system translations: - system(ID)_(trainingCondition).xml where: - ID is an integer value between 01 and 31 - trainingCondition is one of: 'constrained', 'unconstrained' Note that the system ID is used to uniquely identify a participating site, across all language pairs. For example, 'system02' did not submit Arabic-to- English or Urdu-to-English translations, but it did submit Chinese-to-English and Urdu-to-English translations. Systems were 'anonymized' using this systemID, in both the file name and the 'sysid' attribute value. For more information regarding the training conditions, please refer to section 2 of the NIST Open MT08 evaluation plan: http://www.nist.gov/itl/iad/mig/tests/mt/2008/doc/MT08_EvalPlan.v2.4.pdf 3. Data genres The Arabic-to-English, Chinese-to-English and Urdu-to-English data consist of Newswire and Web Data documents. The English-to-Chinese data consist of Newswire documents only. Note that the only difference between the original NIST Open MT 2008 current evaluation set and this subset is the following: Two randomly selected documents were removed from both the reference and the system translations, for each language pair, and for each data genre. 4. Contact information If you have questions or comments about this data, please contact: mt_poc@nist.gov