NIST Open Machine Translation 2012 Current Test =============================================== This set contains the evaluation sets (source data and human reference translations), DTD, scoring software, and evaluation plan for the Current test of the NIST Open Machine Translation 2012 Evaluation. The Current test consisted of two Chinese-to-English tests; one containing general Newswire and Web data, one limited to Web data from a restricted domain, "food". The data stems from January to March 2011. Please refer to the evaluation plan included in this package for more details. A test set consists of two files, a source and a reference file. Each reference file contains four independent human reference translations of the source data. The test sets in this package are in XML format compliant with the included DTD. Please contact mt_poc@nist.gov with questions. Please visit the NIST OpenMT website, http://www.nist.gov/itl/iad/mig/openmt.cfm, for general information on the NIST OpenMT evaluations. Package Contents ---------------- README.txt - this file Evaluation plan: OpenMT12_EvalPlan.pdf Scoring utility: mteval-v13a-20091001.tar.gz DTD: mteval-xml-v1.6.dtd Test sets (src = source, ref = human reference translations): Chinese-to-English general: OpenMT12_Current_chi2eng-[src|ref].xml Chinese-to-English restricted domain: OpenMT12_Current_chi2eng-RestrictedDomain-[src|ref].xml Data Set Statistics ------------------- Data genres: nw = newswire wb = web data Source Genre Documents Segments Source tokens Chinese general nw 45 400 18184 Chinese general wb 28 420 15181 Chinese restricted domain wb 149 2184 48422 The token counts for Chinese data are "character" counts, which were obtained by counting tokens matching the UNICODE-based regular expression "\w". The Python "re" module was used to obtain these counts.