Information about the sense tagged corpus Hwee Tou Ng Dec 1996 This directory contains the sense tagged corpus hand tagged by 12 undergraduates from the Linguistics Program of the National University of Singapore. The corpus contains sentences in which about 192,800 word occurrences have been tagged with WordNet senses. These sentences are taken from the Brown corpus and the Wall Street Journal corpus, and the sense-tagged word occurrences consist of 121 nouns and 70 verbs which are the most frequently occurring and ambiguous words of English. These 121 nouns and 70 verbs are listed in nlist.txt and vlist.txt respectively. The WordNet 1.5 sense definitions of these nouns and verbs that were used in preparing the sense-tagged corpus appear in ndefs.txt and vdefs.txt respectively. All sentences that contain occurrences of a particular "word" in part of speech noun (or verb) are collected together in one file "word.n" (or "word.v"). Sentences in these files are arranged into one sentence per line, separated by a blank line. We illustrate the format of a sentence with the following example taken from the first line of the file "action.n": ca01.db #020 `` These >> actions 8 << should serve to protect in fact and in effect the court 's wards from undue costs and its appointed and elected servants from unmeritorious criticisms '' , the jury said . Each sentence starts with a file identification (ca01.db in this example) and a sentence number (#020 in this example), followed by the actual sentence. The word that is sense tagged is delimited by ">>" and "<<", and the tagged WordNet sense number appears after the word before the closing "<<". So in the example sentence shown here, the word occurrence "actions" is tagged with sense 8 of the noun "action" in WordNet. This sentence is taken from file "ca01.db" of the Brown corpus, and it is the 20th sentence in this file. The file identification of a Brown corpus file starts with "c???.db" while that for a Wall Street Journal file starts with "dj??.db". Sometimes a verb is followed by a particle such that none of the WordNet sense definitions for the verb apply. The verb together with the particle is more appropriately treated as a collocation. For example, in the sentence: "He put off the meeting to next month." "put off" (as in delay) is more appropriately treated as a collocation, and none of the given WordNet senses of "put" apply here. In such a situation, sense -1 is assigned to the verb occurrence "put". A word is assigned sense 0 if (1) none of the given WordNet senses is appropriate and it is not part of a verb collocation; or (2) it is not possible to assign it a unique sense based on the context of the given sentence; or (3) there is a genuine mistake in the sentence (eg., the word to be tagged appears in the wrong part of speech). This sense tagged corpus was first reported in the following paper at ACL-96: Integrating Multiple Knowledge Sources to Disambiguate Word Sense: An Exemplar-Based Approach Hwee Tou Ng and Hian Beng Lee In Proceedings of the 34th Annual Meeting of the Association for Computational Linguistics, pages 40-47, Santa Cruz, California, USA, June 1996. http://xxx.lanl.gov/abs/cmp-lg/9606032 For further questions about the corpus, please contact: Dr. Hwee Tou Ng Defence Science Organisation 20 Science Park Drive Singapore 118230 Republic of Singapore email: nhweetou@dso.gov.sg