North American News Text Corpus
Item Name: | North American News Text Corpus |
Author(s): | David Graff |
LDC Catalog No.: | LDC95T21 |
ISBN: | 1-58563-053-5 |
ISLRN: | 667-148-284-023-7 |
DOI: | https://doi.org/10.35111/56ty-0638 |
Member Year(s): | 1995, 1996, 1997 |
DCMI Type(s): | Text |
Data Source(s): | newswire |
Project(s): | TIDES, MUC, Hub4, GALE, EARS |
Application(s): | language modeling, information retrieval |
Language(s): | English |
Language ID(s): | eng |
License(s): |
North American News Text Agreement |
Online Documentation: | LDC95T21 Documents |
Licensing Instructions: | Subscription & Standard Members, and Non-Members |
Citation: | Graff, David. North American News Text Corpus LDC95T21. Web Download. Philadelphia: Linguistic Data Consortium, 1995. |
Related Works: | View |
North American News Text Corpus is composed of English newswire text formatted using TIPSTER-style SGML markup from the following sources:
Los Angeles Times/Washington Post Service 05/94-08/97 - 52 million words
New York Times News 07/94-12/96 - 173 million words
Reuters News Service 04/94-12/96 - 85 million words
Wall Street Journal 07/94-12/96 - 40 million words
The New York Times and the L. A. Times/Washington Post services also include a range of other newspaper sources in their syndicated newswires. The Los Angeles Times/Washington Post material includes the following sources (in lesser amounts) in addition to the two predominant sources:
- Newsday
- The Baltimore Sun
- The Hartford Courant
The New York Times material contains the following sources in lesser amounts, but New York Times articles predominate:
- Bloomberg Business News
- The Boston Globe
- Los Angeles Daily News
- Fort Worth Star-Telegram
- Newsweek
- Cox News Service
- The Arizona Republic
- Seattle Post-Intelligencer
- San Francisco Examiner
- Houston Chronicle
- San Francisco Chronicle
- Economist Newspaper Ltd.
- Hearst Newspapers
These newswire services also include small numbers of articles from a larger set of miscellaneous sources. The ones listed above appear with some frequency on a daily basis.
Additional Licensing Instructions
This 'members-only' corpus is available to current LDC members who can request the data at the listed reduced-license fee.