Hong Kong Hansards Parallel Text
Item Name: | Hong Kong Hansards Parallel Text |
Author(s): | Xiaoyi Ma |
LDC Catalog No.: | LDC2000T50 |
ISBN: | 1-58563-175-2 |
ISLRN: | 272-276-125-586-5 |
DOI: | https://doi.org/10.35111/0dcb-s792 |
Member Year(s): | 2000 |
DCMI Type(s): | Text |
Data Source(s): | government documents |
Project(s): | TIDES, GALE |
Application(s): | machine translation |
Language(s): | English, Chinese |
Language ID(s): | eng, zho |
Online Documentation: | LDC2000T50 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Ma, Xiaoyi. Hong Kong Hansards Parallel Text LDC2000T50. Web Download. Philadelphia: Linguistic Data Consortium, 2000. |
Related Works: | View |
Introduction
Hong Kong Hansards Parallel Text was developed by the Linguistic Data Consortium (LDC) and contains excerpts from the Official Record of Proceedings of the Legislative Council of the Hong Kong Special Administrative Region (HKSAR) from October 1995 to April 2000.
LDC thanks the Hong Kong Special Administrative Region of the Peoples Republic of China for granting permission to distribute this data to the research community.
The Legislative Council normally meets every Wednesday afternoon in the Chamber of the Legislative Council Building. Business includes: discussion of subsidiary legislation, papers, reports, addresses, statements, questions, the three readings of bills, motions and debates.
From time to time, the Chief Executive attends a special Council meeting to brief Members on policy issues and to answer questions from Members. All Council meetings are open to the public. The proceedings of the meetings are recorded verbatim in the Official Record of Proceedings of the Legislative Council (Hansard).
The record of proceedings is in the original language delivered by the speakers (Floor Version). They are then translated into English and Chinese versions separately.
Data
This corpus contains excerpts from the official record of meetings from October 1995 to April 2000. There are 11.9 million English words and 18.15 million Chinese characters in this release. Chinese text is presented in the traditional script and encoded as BIG5.
There are 388 files in the data/ subdirectory of this corpus, half (194 files) in English in the data/english/ subdirectory and half (194 files) in Chinese in the data/chinese/ subdirectory. Data file names are in the form YYYYMMDD_[ce].doc, where YYYYMMDD indicates the date of the meeting, c=Chinese and e=English. As an example of the text in this corpus the Chinese sample is part of the Chinese language record of the meeting held on May 24, 1997. The parallel English file is in the English sample.
Copying and Distribution
Permission has been granted to the Linguistic Data Consortium to make and distribute copies of the laws, press releases and news of Hong Kong Special Administrative Region provided this copyright notice and permission notice are distributed with all copies.
Permission has been given to reproduce the laws, press releases, and/or news articles from the Hong Kong Special Administrative Region Government website for research, education, and technology development.
Updates
There are no updates at this time.
Additional Licensing Instructions
This 'members-only' corpora is available to current members who can request the data at the listed reduced-license fee. Contact ldc@ldc.upenn.edu for information about becoming a member.