English News Text Treebank: Penn Treebank Revised
|Item Name:||English News Text Treebank: Penn Treebank Revised|
|Author(s):||Ann Bies, Justin Mott, Colin Warner|
|LDC Catalog No.:||LDC2015T13|
|Release Date:||July 15, 2015|
|Application(s):||parsing, tagging, part of speech tagging, natural language processing|
LDC User Agreement for Non-Members
|Online Documentation:||LDC2015T13 Documents|
|Licensing Instructions:||Subscription & Standard Members, and Non-Members|
|Citation:||Bies, Ann, Justin Mott, and Colin Warner. English News Text Treebank: Penn Treebank Revised LDC2015T13. Web Download. Philadelphia: Linguistic Data Consortium, 2015.|
English News Text Treebank: Penn Treebank Revised was developed by the Linguistic Data Consortium (LDC) with funding through a gift from Google Inc. It consists of a combination of automated and manual revisions of the Penn Treebank annotation of Wall Street Journal (WSJ) stories. The data is comprised of 1,203,648 word-level tokens in 49,191 sentence-level tokens -- in all 2,312 of the original Penn Treebank WSJ files.
This release includes revised tokenization, part-of-speech, and syntactic treebank annotation intended to bring the full WSJ treebank section into compliance with the agreed-upon policies and updates implemented for current English treebank annotation specifications at LDC. Examples include English Web Treebank (LDC2012T13), OntoNotes (LDC2013T19), and English translation treebanks such as English Translation Treebank: An-Nahar Newswire (LDC2012T02). English Treebank Supplemental Guidelines are included in this release.
None at this time.