Hong Kong News Parallel Text This FTP publication contains the Hong Kong News Parallel Text, produced by the Linguistic Data Consortium (LDC), catalog number LDC2000T46, isbn 1-58563-169-8. The Hong Kong News Parallel Text was created when the LDC collected parallel Cantonese - English news articles from the Information Services Department of Hong Kong Special Administrative Region (HKSAR) of the People's Republic of China. We wish to thank the Hong Kong Special Administrative Region of the People's Republic of China for granting the LDC permission to distribute this data to the research community. This corpus contains 36,294 total articles (18,147 aligned article pairs) released by HKSAR from July 1, 1997 to April 30th, 2000. Each article is in a separate file. Automatic article alignment was done at the LDC. Additional information is available at the LDC web site, http://www.ldc.upenn.edu/Catalog |by year|2000|LDC2000T46. STRUCTURE OF THE DATA: The articles are in the following directories: /1997/chinese 1,555 files /english 1,555 files /1998/chinese 5,564 files /english 5,564 files /1999/chinese 8,402 files /english 8,402 files /2000/chinese 2,626 files /english 2,626 files total 36,294 files Each article is a separate file, thus there are 18,147 article pairs. The files are named using the convention; yyyymmdd_nnn.[ce] where yyyy = year, mm = month, dd = date, nnn = article date sequence number, with the subscripts, c = Cantonese, and e = English. The example.c and example.e files contains a sample corresponding news article from the corpus. The articles were collected by an automated system from the internet. Incoming data was spooled directly to a "raw collection" file and the raw files were then processed to produce the following format for release by the LDC. Table.txt maps the Cantonese files (*.c) to the corresponding English files (*.e). The Cantonese files are encoded in BIG5 with user-defined characters by HKSAR. See http://www.info.gov.hk/gccs/ for detail. COPYING AND DISTRIBUTION Permission is granted to the Linguistic Data Consortium to make and distribute copies of the laws, press releases and news of Hong Kong Special Administrative Region provided this copyright notice and permission notice are distributed with all copies. USAGE Permission has been given to reproduce the laws, press releases, and/or news articles from the Hong Kong Special Administrative Region Government website for research and educational purposes. This permission is granted for the mentioned purposes only and prior permission must be granted by "The Government of the Hong Kong Special Administrative Region" if the materials are to be used for any other purposes. The files, extracts from the files, and translations of the files must not be sold as part of any commercial software package, nor can they be incorporated in any printed document without the specific permission of the copyright holders. COPYRIGHT Portions Copyright (C) 1997-2000, The Government of the Hong Kong Special Administrative Region (HKSAR)