Title: Chinese Lexical Resources for Gender, Number, Animacy Authors: Zhiyi Song, Jiahong Yuan, Xiaoyi Ma, Stephanie Strassel 1. Introduction DARPA's Deep Exploration and Filtering of Text (DEFT) program aimed to address remaining capability gaps in state-of-the-art natural language processing technologies related to inference, causal relationships and anomaly detection. In support of DEFT, LDC provided source data and core resources for system development. Gender, number, and animacy are lexical indicators that can be useful in the detection of person mentions. LDC created the Chinese Lexical Resources for Gender, Number, Animacy Corpus by extracting information from newswire texts in the Chinse Gigaword Corpus (LDC2011T13). The corpus includes dictionaries of Chinese animate nominals and names; Chinese nominals and name with gender and number predicted; and other dictionaries of Chinese nominals, names, verbs and pronouns; each dictionary contains frequency information as well as the features in question. The data was released to DEFT performers with an eye to improve performance of core NLP capabilities such as entity tagging and coreference. 2. Contents ./docs/ README.txt -- This file ChineseLexicalResources_v4.pdf -- Detailed description of how lexicons contained in this release were created designators/ -- All designator lists used in building gender and entity lexicons ./data/animacy This directory contains dictionaries of Chinese animate nominals and names. The files contain listings of extracted nominal or name phrases with frequency of occurence. In each line of the following .lex files, the (possibly multi-word) noun phrase is followed by a tab and then its frequency from the query result. appositionPersonName.lex nameSubjectBa.lex nounSubjectBa.lex org_name_NRNN.lex In each line of the following .lex files, the (possibly multi-word) noun phrase is followed by a tab and then its predicted entity type and frequency from the query result. The three entity types that are predicted include Person (per), Organization (org) and Location (loc), where location covers GeoPolitical Entity, Location and Facility. named_entity_apposition.lex named_entity_NRNN.lex ./data/gender_number This directory contains dictionaries of Chinese nominals and names with gender and number predicted (refer to ChineseLexicalResources_v4.pdf for details of queries). The files contain listings of extracted nominal or name phrases and their possible gender counts (as predicted based on surrounding text). In each line, the (possibly multi-word) noun phrase is followed by a tab and then columns holding the counts for the corresponding gender/number. Gender and number are coded as: MS male singular FS female singular NP neutral plural IS inanimate singular NS neutral singular IP inanimate plural NS neutral singular conjunctive_possessive_name.lex conjunctive_possessive_noun.lex nominative_predicate_name.lex nominative_predicate_noun.lex verb_nominative_name.lex verb_nominative_noun.lex The following two dictionaries include person names, each of which is singular. The gender is coded as: F female M male N neutral gender_designator_apposition_name.lex gender_designator_NRNN_name.lex ./data/other This directory contains dictionaries of Chinese nominals, names, verbs and pronouns. The files contain listings of extracted words and their frequency. In each line, the (possibly multi-word) noun phrase is followed by a tab and then its frequency from the query result. name.lex noun.lex pronouns.lex verb.lex 3. Acknowledgement This material is based on research sponsored by Air Force Research Laboratory and Defense Advance Research Projects Agency under agreement number FA8750-13-2-0045. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright notation thereon. The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of Air Force Research Laboratory and Defense Advanced Research Projects Agency or the U.S. Government. 4. Copyright 5. Contact Information Zhiyi Song Research Project Manager Stephanie Strassel DEFT PI --------------- README log Created by Zhiyi Song, July 14, 2014 Updated by Zhiyi Song, July 17, 2014 Updated by Zhiyi Song, July 21, 2014 Updated by Zhiyi Song, November 20, 2014 Updated by Zhiyi Song, February 9, 2015 Updated by Zhiyi song, February 10, 2015 Updated by Zhiyi Song, October 7, 2015 Updated by Zhiyi Song, November 16, 2015 Updated by Ann Sawyer, January 12, 2017