People's Daily
Availability: CD-ROM
Data type: Text
Text type: Journalistic (newspaper)
Domain(s): National, International News
Language: Mandarin Chinese
General Description:
The p_daily/ directory contains newspaper articles from the
Beijing-based Renmin Ribao (People's Daily), the largest newspaper
published by the government of the People's Republic of China. The
agreement for research use of the text was reached with the Foreign
Affairs Bureau of Renmin Ribao.
The text archive was made available to the LDC in two phases: the
first delivery, made in 1994, was made on 100+ floppy disks, and the
second, made in 1996, was made on CD-ROM.
Publisher and place of publication: Renmin Ribao (People's Daily)
Beijing, People's Republic of China
Collector of Data: Linguistic Data Consortium
Collection time span: 1991-1996
Description of file organization: one file per month.
Number of files: 72
Total size: 290 megabytes;
about 125 million text characters (1% ASCII, 99% GB-encoded 16-bit)
Tagging description:
The format uses a labeled bracketing, expressed in the style of SGML
(Standard Generalized Markup Language). Each article (originally a
separate file) is enclosed in
or
.
Characters are encoded in the "GB" system used in the People's Republic of
China. To view files conveniently in MULE (Multi-lingual Emacs), you may
want to use a simple shell script like the one provided in the tools/
directory.
The header fields vary somewhat from time to time. The first file (pd9101)
has only