Home › Language Resources › Data

COMLEX Syntax Text Corpus Version 2.0

Item Name:	COMLEX Syntax Text Corpus Version 2.0
Author(s):	Catherine Macleod, Adam Meyers, Ralph Grishman
LDC Catalog No.:	LDC96T11
ISBN:	1-58563-148-5
ISLRN:	184-170-097-975-5
DOI:	https://doi.org/10.35111/ryhw-kn17
Member Year(s):	1996, 1998
DCMI Type(s):	Text
Data Source(s):	newswire, varied
Application(s):	natural language processing
Language(s):	English
Language ID(s):	eng
License(s):	COMLEX For-Profit Agreement COMLEX Non-member Agreement COMLEX Non-Profit Agreement
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Macleod, Catherine, Adam Meyers, and Ralph Grishman. COMLEX Syntax Text Corpus Version 2.0 LDC96T11. Web Download. Philadelphia: Linguistic Data Consortium, 1996.
Related Works: Hide	View hasAnnotation LDC2008T24 COMNOM v 1.0 relatesTo LDC98L21 COMLEX English Syntax Lexicon

Introduction

COMLEX Syntax Text Corpus Version 2.0 was developed by the Linguistic Data Consortium (LDC) and consists of approximately 30,000 newswire documents in English.

The purpose of this corpus was to serve as the basis for a tagging task for the COMLEX English Syntax Lexicon (LDC98L21), tagging 750 of the most common verbs in the corpus with COMLEX complements. This task was somewhat different from the usual tagging of a corpus, in that the tags appear in the dictionary, not in the corpus. The tag in the dictionary entry consists of the byte number where the text example can be located in the corpus, the source, and the complement name.

Data

The corpus totals about 100 MB of text including parts of the Brown Corpus (7 MB), Wall Street Journal (27 MB), San Jose Mercury (30 MB), and Associated Press (29.5 MB). Much of the text contains SGML and other tags from their original sources. In addition to the file of text, the corpus also contains a TABLE file which lists the start, length, and ending bytes of each individual source document as well as for the sources overall (e.g. Wall Street Journal, San Jose Mercury, Brown Corpus).

COMLEX Syntax Text Corpus Version 2.0

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees