English Web Treebank Propbank

Item Name: English Web Treebank Propbank
Author(s): Tim O'Gorman, Katherine Conger, Martha Palmer
LDC Catalog No.: LDC2017T15
ISBN: 1-58563-818-8
ISLRN: 385-163-116-259-0
Release Date: October 18, 2017
Member Year(s): 2017
DCMI Type(s): Text
Data Source(s): weblogs, email, newsgroups, question-answers, reviews
Application(s): question-answering, entity extraction, part of speech tagging, semantic role labelling
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2017T15 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: O'Gorman, Tim, Katherine Conger, and Martha Palmer. English Web Treebank Propbank LDC2017T15. Web Download. Philadelphia: Linguistic Data Consortium, 2017.

Introduction

English Web Treebank Propbank, LDC Catalog Number LDC2017T15 and ISBN 1-58563-818-8, was developed by the University of Colorado Boulder - CLEAR (Computational Language and Education Research) and provides predicate-argument structure annotation for English Web Treebank (LDC2012T13).

The goal of Propbank (or proposition bank) annotation is to develop annotations with information about basic semantic propositions. English Web Treebank Propbank provides semantic role annotation and predicate sense disambiguation for roughly 50,000 predicates, corresponding to all verbs, all adjectives in equational clauses and all nouns considered to be predicative. Mark-up is in the "unified" propbank annotation format, which combines representations in nouns, verbs and adjectives.

Data

The source data consists of weblogs, newsgroups, email, reviews and questions-answers. Human annotators followed the guidelines included with this release. Annotated propositions were automatically validated to ensure that (1) pointers to the tree nodes were valid, (2) Propbank labels were valid, and (3) Propbank annotation was consistent with the associated frameset.

Additionally, XML frame files were validated against the included dtd and were checked for frame internal consistency (e.g. misspelling, extraneous characters, general correctness). Data is presented in UTF-8 XML files.

Samples

Please view the following samples.

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee