Ancient Chinese WordNet

Item Name: Ancient Chinese WordNet
Author(s): Bin Li, Feng Minxuan, Dai Junyang, Xu Huidan, Lu Xin, Tuo Xinyu, Wang Lezhi, Zhang Yuqin
LDC Catalog No.: LDC2026L03
ISLRN: 662-487-315-741-3
DOI: https://doi.org/10.35111/m3h4-rm10
Release Date: March 16, 2026
Member Year(s): 2026
DCMI Type(s): Text
Data Source(s): dictionaries
Application(s): cross-lingual information retrieval, language learning, language teaching
Language(s): Literary Chinese, Old Chinese
Language ID(s): lzh, och
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2026L03 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Li, Bin, et al. Ancient Chinese WordNet LDC2026L03. Web Download. Philadelphia: Linguistic Data Consortium, 2026.
Related Works: View

Introduction

Ancient Chinese WordNet was developed by Nanjing Normal University and contains lexical and semantic information for Ancient Chinese vocabulary dating back to the Pre-Qin period (before 221 BCE). The WordNet comprises 38,781 word forms and 55,100 senses, each manually linked to a corresponding synset in Princeton WordNet 1.6.

The Ancient Chinese WordNet (ACWN) project began in 2012 with the goal of creating a structured lexical database to support linguistic research and natural language processing applications involving historical Chinese language materials. ACWN organizes vocabulary using WordNet's noun, verb, adjective, and adverb hierarchies and provides WordNet definitions, semantic relations, and categorization for each sense.

Data

Ancient Chinese WordNet contains 55,100 records, where each record represents a single Ancient Chinese lexical item mapped to one WordNet synset. It follows WordNet 1.6 organizational structure, including 22 noun categories, 15 verb categories, and additional adjective and adverb categories.

Each entry includes the following fields:

  • ID - The serial number of the ACWN entry
  • Word - Ancient Chinese word form
  • wn_offset - 8-digit WordNet 1.6 synset offset with trailing POS (n/v/a/s/r)
  • senseid - Sense number for this word form (ordinal among that word's senses)
  • pos - Part of speech (noun (n), verb (v), adj (a/s), adv (r))
  • wn_category - Numeric code for the WordNet 1.6 lexicographer file (category)
  • wn_synset - Synset headword(s) in WordNet 1.6
  • wn_definition - WordNet gloss for the synset
  • wn_similar to - Synset with similar meaning
  • wn_pertainym - Pertainym synset offset(s)
  • wn_attribute - Attribute synset offset(s)
  • wn_hypernym - Hypernym synset offset(s)
  • wn_hyponym - Hyponym synset offset(s)

The data is presented in UTF-8 encoded CSV and XLSX formats.

Samples

Updates

No updates at this time.

Available Media

View Fees





Login for the applicable fee