North American News Text Corpus
| Item Name: | North American News Text Corpus |
| Author(s): | David Graff |
| LDC Catalog No.: | LDC95T21 |
| ISBN: | 1-58563-053-5 |
| ISLRN: | 667-148-284-023-7 |
| DOI: | https://doi.org/10.35111/56ty-0638 |
| Member Year(s): | 1995, 1996, 1997 |
| DCMI Type(s): | Text |
| Data Source(s): | newswire |
| Project(s): | TIDES, MUC, Hub4, GALE, EARS |
| Application(s): | language modeling, information retrieval |
| Language(s): | English |
| Language ID(s): | eng |
| License(s): |
North American News Text Agreement |
| Online Documentation: | LDC95T21 Documents |
| Licensing Instructions: | Subscription & Standard Members, and Non-Members |
| Citation: | Graff, David. North American News Text Corpus LDC95T21. Web Download. Philadelphia: Linguistic Data Consortium, 1995. |
| Related Works: | View |
North American News Text Corpus is composed of English newswire text formatted using TIPSTER-style SGML markup from the following sources:
Los Angeles Times/Washington Post Service 05/94-08/97 - 52 million words
New York Times News 07/94-12/96 - 173 million words
Reuters News Service 04/94-12/96 - 85 million words
Wall Street Journal 07/94-12/96 - 40 million words
The New York Times and the L. A. Times/Washington Post services also include a range of other newspaper sources in their syndicated newswires. The Los Angeles Times/Washington Post material includes the following sources (in lesser amounts) in addition to the two predominant sources:
- Newsday
- The Baltimore Sun
- The Hartford Courant
The New York Times material contains the following sources in lesser amounts, but New York Times articles predominate:
- Bloomberg Business News
- The Boston Globe
- Los Angeles Daily News
- Fort Worth Star-Telegram
- Newsweek
- Cox News Service
- The Arizona Republic
- Seattle Post-Intelligencer
- San Francisco Examiner
- Houston Chronicle
- San Francisco Chronicle
- Economist Newspaper Ltd.
- Hearst Newspapers
These newswire services also include small numbers of articles from a larger set of miscellaneous sources. The ones listed above appear with some frequency on a daily basis.
Additional Licensing Instructions
This 'members-only' corpus is available to current LDC members who can request the data at the listed reduced-license fee.