English Gigaword Second Edition
|Item Name:||English Gigaword Second Edition|
|Author(s):||David Graff, Junbo Kong, Ke Chen, Kazuaki Maeda|
|LDC Catalog No.:||LDC2005T12|
|Release Date:||July 15, 2005|
|Project(s):||TIDES, GALE, EARS|
|Application(s):||natural language processing, language modeling, information retrieval|
LDC User Agreement for Non-Members
|Online Documentation:||LDC2005T12 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Graff, David, et al. English Gigaword Second Edition LDC2005T12. Web Download. Philadelphia: Linguistic Data Consortium, 2005.|
English Gigaword Second Edition was produced by Linguistic Data Consortium (LDC) catalog number LDC2005T12 and ISBN 1-58563-350-X. The English Gigaword corpus is a comprehensive archive of newswire text data in English that has been acquired over several years by the LDC. This is the second edition of the English Gigaword corpus.
This edition includes all of the contents in the first edition of the English Gigaword corpus (LDC2003T05) as well as new data from July 2002 through Dec 2004. Also, a new newswire source (the Central New Agency of Taiwan, English Service) has been added in this edition.
The five distinct international sources of English newswire included in this release are the following:
|Agence France-Presse, English Service||(afp_eng )|
|Associated Press Worldstream, English Service||(apw_eng)|
|Central News Agency of Taiwan, English Service||(cna_eng)|
|The New York Times Newswire Service||(nyt_eng)|
|The Xinhua News Agency, English Service||(xin_eng)|
What's New In The Second Edition
The Reduced Licensing Fee for this corpus is US$400.