Hong Kong Hansards Parallel Text

Item Name: Hong Kong Hansards Parallel Text
Author(s): Xiaoyi Ma
LDC Catalog No.: LDC2000T50
ISBN: 1-58563-175-2
ISLRN: 272-276-125-586-5
DOI: https://doi.org/10.35111/0dcb-s792
Member Year(s): 2000
DCMI Type(s): Text
Data Source(s): government documents
Project(s): TIDES, GALE
Application(s): machine translation
Language(s): English, Chinese
Language ID(s): eng, zho
Online Documentation: LDC2000T50 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Ma, Xiaoyi. Hong Kong Hansards Parallel Text LDC2000T50. Web Download. Philadelphia: Linguistic Data Consortium, 2000.
Related Works: View


Hong Kong Hansards Parallel Text was developed by the Linguistic Data Consortium (LDC) and contains excerpts from the Official Record of Proceedings of the Legislative Council of the Hong Kong Special Administrative Region (HKSAR) from October 1995 to April 2000.

LDC thanks the Hong Kong Special Administrative Region of the Peoples Republic of China for granting permission to distribute this data to the research community.

The Legislative Council normally meets every Wednesday afternoon in the Chamber of the Legislative Council Building. Business includes: discussion of subsidiary legislation, papers, reports, addresses, statements, questions, the three readings of bills, motions and debates.

From time to time, the Chief Executive attends a special Council meeting to brief Members on policy issues and to answer questions from Members. All Council meetings are open to the public. The proceedings of the meetings are recorded verbatim in the Official Record of Proceedings of the Legislative Council (Hansard).

The record of proceedings is in the original language delivered by the speakers (Floor Version). They are then translated into English and Chinese versions separately.


This corpus contains excerpts from the official record of meetings from October 1995 to April 2000. There are 11.9 million English words and 18.15 million Chinese characters in this release. Chinese text is presented in the traditional script and encoded as BIG5.

There are 388 files in the data/ subdirectory of this corpus, half (194 files) in English in the data/english/ subdirectory and half (194 files) in Chinese in the data/chinese/ subdirectory. Data file names are in the form YYYYMMDD_[ce].doc, where YYYYMMDD indicates the date of the meeting, c=Chinese and e=English. As an example of the text in this corpus the Chinese sample is part of the Chinese language record of the meeting held on May 24, 1997. The parallel English file is in the English sample.

Copying and Distribution

Permission has been granted to the Linguistic Data Consortium to make and distribute copies of the laws, press releases and news of Hong Kong Special Administrative Region provided this copyright notice and permission notice are distributed with all copies.

Permission has been given to reproduce the laws, press releases, and/or news articles from the Hong Kong Special Administrative Region Government website for research, education, and technology development.


There are no updates at this time.

Additional Licensing Instructions

This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.

Available Media

View Fees

Login for the applicable fee