Mandarin Chinese News Text

Item Name: Mandarin Chinese News Text
Authors: Zhibiao Wu
LDC Catalog No.: LDC95T13
ISBN: 1-58563-052-7
Data Type: text
Data Source(s): newswire
Project(s): EARS, GALE, TIDES, Tipster, TREC
Application(s): information retrieval, language modeling
Language(s): Mandarin Chinese
Distribution: 1 CD
Member fee: $0 for 1995, 1996, 1997 members
Non-member Fee: US $500.00
Reduced-License Fee: US $250.00
Extra-Copy Fee: US $150.00
Non-member License: yes
Member License: yes
Online documentation: yes
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Zhibiao Wu
Mandarin Chinese News Text
Linguistic Data Consortium, Philadelphia

The Linguistic Data Consortium (LDC) announces the availability of a Mandarin Chinese text corpus. This corpus includes about 250 million GB-encoded text characters.

The Mandarin News Corpus includes text from various journalistic sources:

  • newspaper text from Renmin Ribao (People's Daily)
  • radio scripts from China Radio International
  • newswire text from Xinhua newswire service
The format of this corpus uses a labeled bracketing, expressed in the style of SGML (Standard Generalized Markup Language). The header fields provided by the sources, which give information such as topic, date and article ID, have been retained. The articles cover a variety of topics, including international and domestic news, sports and culture.

Content Copyright