Mandarin Chinese News Text

Item Name: Mandarin Chinese News Text
Author(s): Zhibiao Wu
LDC Catalog No.: LDC95T13
ISBN: 1-58563-052-7
ISLRN: 133-578-348-091-2
Member Year(s): 1995, 1996, 1997
DCMI Type(s): Text
Data Source(s): newswire
Project(s): TREC, Tipster, TIDES, GALE, EARS
Application(s): language modeling, information retrieval
Language(s): Mandarin Chinese
Language ID(s): cmn
License(s): Mandarin Chinese News Text Agreement
Online Documentation: LDC95T13 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Wu, Zhibiao. Mandarin Chinese News Text LDC95T13. Web Download. Philadelphia: Linguistic Data Consortium, 1995.
Related Works: View
The Linguistic Data Consortium (LDC) announces the availability of a Mandarin Chinese text corpus. This corpus includes about 250 million GB-encoded text characters.

The Mandarin News Corpus includes text from various journalistic sources:

  • newspaper text from Renmin Ribao (People's Daily)
  • radio scripts from China Radio International
  • newswire text from Xinhua newswire service
The format of this corpus uses a labeled bracketing, expressed in the style of SGML (Standard Generalized Markup Language). The header fields provided by the sources, which give information such as topic, date and article ID, have been retained. The articles cover a variety of topics, including international and domestic news, sports and culture.

Available Media

View Fees

Login for the applicable fee