COMNOM v 1.0

Item Name: COMNOM v 1.0
Author(s): Adam Meyers, Ruth Reeves, Catherine Macleod
LDC Catalog No.: LDC2008T24
ISBN: 1-58563-493-X
ISLRN: 419-167-670-549-0
Release Date: September 15, 2007
Member Year(s): 2008
DCMI Type(s): Text
Data Source(s): newswire
Language(s): English
Language ID(s): eng
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2008T24 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Meyers, Adam, Ruth Reeves, and Catherine Macleod. COMNOM v 1.0 LDC2008T24. Web Download. Philadelphia: Linguistic Data Consortium, 2008.

Introduction

COMNOM is an automatically enriched version of COMLEX Syntax that was created at New York University as part of the NomBank annotation project. COMLEX resources are distributed by the Linguistic Data Consortium (LDC) and consist of the following: COMLEX English Syntax Lexicon (LDC98L21), an English dictionary consisting of approximately 38,000 lemmas with detailed information about the syntactic characteristics of each lexical item and subcategorization (complement structures); and COMLEX Syntax Text Corpus Version 2.0 (LDC96T11).

COMNOM adds classes to COMLEX Syntax lexical entries using NOMLEX-PLUS, a dictionary with approximately 8,000 entries. COMNOM collected prepositions from NOMLEX-PLUS sub-categorizations (:VERB-SUBC, :OBJECT, :SUBJECT, etc.), deduced essential complements from them and added them to the existing COMLEX entry.

Further information about the methodology used in COMNOM can be found in Meyers, "Those Other NomBank Dictionaries -- Manual for Dictionaries that Come with NomBank". Related resources and further information about COMNOM and NomBank are available from the Nom Bank project website.

A license to COMLEX English Syntax Lexicon (LDC98L21) or COMLEX Syntax Text Corpus Version 2.0 (LDC96T11) is required in order to obtain COMNOM v. 1.0.

Data

This release includes three versions of COMNOM which correspond to the three versions of NOMLEX-PLUS and are characterized by the amount of corpus training that influenced their creation. The data used for training are the Wall Street Journal materials in the Penn Treebanks (Treebank-2 and Treebank-3), with annotations from Proposition Bank I and NomBank 1.0.

The three versions are:

  • COMNOM-clean.1.0 -- contains no information derived from annotated data
  • COMNOM.1.0 -- contains information from the entire annotated corpus
  • COMNOM-training.1.0 -- contains information from annotated data in sections 02-21 of the corpus only.
  • Available Media

    View Fees





    Login for the applicable fee