CZECH ACADEMIC CORPUS 1.0 GUIDE

C. Description of tags

Table C.1. Part of speech

Value Description
A Adjective
C Numeral
D Adverb
I Interjection
J Conjunction
N Noun
P Pronoun
V Verb
R Preposition
T Particle
X Unknown, Not Determined, Unclassifiable
Z Punctuation (also used for the Sentence Boundary token)

Table C.2. Sub-part of speech

Value Description POS
# Sentence boundary Z – punctuation
% Author’s signature, e.g. haš-99_:B_;S N – noun
* Word krát (lit.: “times”) C – numeral
, Conjunction subordinate (incl. “aby”, “kdyby” in all forms) J – conjuction
} Numeral, written using Roman numerals (XIV) C – numeral
: Punctuation (except for the virtual sentence boundary word ###, which uses the C.2 #) Z – punctuation
= Number written using digits C – numeral
? Numeral “kolik” (lit. “how many”/“how much”) C – numeral
@ Unrecognized word form X – unknown
^ Conjunction (connecting main clauses, not subordinate) J – conjunction
4 Relative/interrogative pronoun with adjectival declension of both types (soft and hard) (“jaký”, “který”, “čí”, ..., lit. “what”, “which”, “whose”, ...) P – pronoun
5 The pronoun he in forms requested after any preposition (with prefix n-: “něj”, “něho”, ..., lit. “him” in various cases) P – pronoun
6 Reflexive pronoun se in long forms (“sebe”, “sobě”, “sebou”, lit. “myself” / “yourself” / “herself” / “himself” in various cases; “se” is personless) P – pronoun
7

Reflexive pronouns “se” (C.5 = 4), “si” (C.5 = 3), plus the same two forms with contracted -s: “ses”, “sis” (distinguished by C.8 = 2; also number is singular only) This should be done somehow more consistently, virtually any word can have this contracted -s (“cos”, “polívkus”, ...)

P – pronoun
8 Possessive reflexive pronoun “svůj” (lit. “my”/“your”/“her”/“his” when the possessor is the subject of the sentence) P – pronoun
9 Relative pronoun “jenž”, “již”, ... after a preposition (n-: “něhož”, “niž”, ..., lit. “who”) P – pronoun
A Adjective, general A – adjective
B Verb, present or future form V – verb
C Adjective, nominal (short, participial) form “rád”, “schopen”, ... A – adjective
D Pronoun, demonstrative (“ten”, “onen”, ..., lit. “this”, “that”, “that”, ... “over there”, ... ) P – pronoun
E Relative pronoun “což” (corresponding to English which in subordinate clauses referring to a part of the preceding text) P – pronoun
F Preposition, part of; never appears isolated, always in a phrase (“nehledě (na)”, “vzhledem (k)”, ..., lit. “regardless”, “because of”) R – preposition
G Adjective derived from present transgressive form of a verb A – adjective
H Personal pronoun, clitical (short) form (“mě”, “mi”, “ti”, “mu”, ...); these forms are used in the second position in a clause (lit. “me”, “you”, “her”, “him”), even though some of them (“mě”) might be regularly used anywhere as well P – pronoun
I Interjections I – interjection
J Relative pronoun “jenž”, “již”, ... not after a preposition (lit. “who”, “whom”) P – pronoun
K Relative/interrogative pronoun “kdo” (lit. “who”), incl. forms with affixes -ž and -s (affixes are distinguished by the category C.15 (for -ž) and C.8 (for -s)) P – pronoun
L Pronoun, indefinite “všechen”, “sám” (lit. “all”, “alone”) P – pronoun
M Adjective derived from verbal past transgressive form A – adjective
N Noun (general) N – noun
O Pronoun “svůj”, “nesvůj”, “tentam” alone (lit. “own self”, “not-in-mood”, “gone”) P – pronoun
P Personal pronoun “já”, “ty”, “on” (lit. “I”, “you”, “he” ) (incl. forms with the enclitic -s, e.g. “tys”, lit. “you’re”); gender position is used for third person to distinguish “on”/“ona”/“ono” (lit. “he”/“she”/“it”), and number for all three persons P – pronoun
Q Pronoun relative/interrogative “co”, “copak”, “cožpak” (lit. “what”, “isn’t-it-true-that”) P – pronoun
R Preposition (general, without vocalization) R – preposition
S Pronoun possessive “můj”, “tvůj”, “jeho” (lit. “my”, “your”, “his”); gender position used for third person to distinguish “jeho”, “její”, “jeho” (lit. “his”, “her”, “its”), and number for all three pronouns P – pronoun
T Particle T – particle
U Adjective possessive (with the masculine ending -ův as well as feminine -in) A – adjective
V Preposition (with vocalization -e or -u): (“ve”, “pode”, “ku”, ..., lit. “in”, “under”, “to”) R – preposition
W Pronoun negative (“nic”, “nikdo”, “nijaký”, “žádný”, ..., lit. “nothing”, “nobody”, “not-worth-mentioning”, “no”/“none”) P – pronoun
X (temporary) Word form recognized, but tag is missing in dictionary due to delays in (asynchronous) dictionary creation  
Y Pronoun relative/interrogative co as an enclitic (after a preposition) (“oč”, “nač”, “zač”, lit. “about what”, “on”/“onto” “what”, “after”/“for what”) P – pronoun
Z Pronoun indefinite (“nějaký”, “některý”, “číkoli”, “cosi”, ..., lit. “some”, “some”, “anybody’s”, “something”) P – pronoun
a Numeral, indefinite (“mnoho”, “málo”, “tolik”, “několik”, “kdovíkolik”, ..., lit. “much”/“many”, “little”/“few”, “that much”/“many”, “some” (“number of”), “who-knows-how-much/many”) C – numeral
b Adverb (without a possibility to form negation and degrees of comparison, e.g. “pozadu”, “naplocho”, ..., lit. “behind”, “flatly”); i.e. both the C.11 as well as the C.10 attributes in the same tag are marked by – (Not applicable) D – adverb
c Conditional (of the verb “být” (lit. “to be”) only) (“by”, “bych”, “bys”, “bychom”, “byste”, lit. “would”) V – verb
d Numeral, generic with adjectival declension (“dvojí”, “desaterý”, ..., lit. “two-kinds”/..., “ten-...”) C – numeral
e Verb, transgressive present (endings -e/-ě, -íc, -íce) V – verb
f Verb, infinitive V – verb
g Adverb (forming negation (C.11 set to A/N) and degrees of comparison C.10 set to 1/2/3 (comparative/superlative), e.g. “velký”, “za\-jí\-ma\-vý”, ..., lit. “big”, “interesting”  
h Numeral, generic: only “jedny” and “nejedny” (lit. “one-kind”/“sort-of”, “not-only-one-kind”/“sort-of”) C – numeral
i Verb, imperative form V – verb
j Numeral, generic greater than or equal to 4 used as a syntactic noun (“čtvero”, “desatero”, ..., lit. “four-kinds”/“sorts-of”, “ten-...”) C – numeral
k Numeral, generic greater than or equal to 4 used as a syntactic adjective, short form (“čtvery”, ..., lit. “four-kinds”/“sorts-of”) C – numeral
l Numeral, cardinal “jeden”, “dva”, “tři”, “čtyři”, “půl”, ... (lit. “one”, “two”, “three”, “four”); also “sto” and “tisíc” (lit. “hundred”, “thousand”) if noun declension is not used C – numeral
m Verb, past transgressive; also archaic present transgressive of perfective verbs (ex.: “udělav”, lit. “(he-)having-done”; arch. also “udělaje” (C.15 = 4), lit. “(he-)having-done)” V – verb
n Numeral, cardinal greater than or equal to 5 C – numeral
o Numeral, multiplicative indefinite (“-krát”, lit. (“times”): “mnohokrát”, “tolikrát”, ..., lit. “many times”, “that many times”) C – numeral
p Verb, past participle, active (including forms with the enclitic - s, lit. ’re (“are”)) V – verb
q Verb, past participle, active, with the enclitic -ť, lit. (“perhaps”) - “could-you-imagine-that?” or “but-because-” (both archaic) V – verb
r Numeral, ordinal (adjective declension without degrees of comparison) C – numeral
s Verb, past participle, passive (including forms with the enclitic -s, lit. ’re (“are”)) V – verb
t Verb, present or future tense, with the enclitic -ť, lit. (“perhaps”) “-could-you-imagine-that?” or “but-because-” (both archaic) V – verb
u Numeral, interrogative “kolikrát”, lit. “how many times?” C – numeral
v Numeral, multiplicative, definite (-krát, lit. “times”: “pětkrát”, ..., lit. “five times”) C – numeral
w Numeral, indefinite, adjectival declension (“nejeden”, “tolikátý”, ..., lit. “not-only-one”, “so-many-times-repeated”) C – numeral
y Numeral, fraction ending at -ina; used as a noun (“pětina”, lit. “one-fifth”) C – numeral
z Numeral, interrogative “kolikátý”, lit. “what” (“at-what-position-place-in-a-sequence”) C – numeral

Table C.3. Gender

Value Description
F Feminine
H {F, N} – Feminine or Neuter
I Masculine inanimate
M Masculine animate
N Neuter
Q Feminine (with singular only) or Neuter (with plural only); used only with participles and nominal forms of adjectives
T Masculine inanimate or Feminine (plural only); used only with participles and nominal forms of adjectives
X Any
Y {M, I} – Masculine (either animate or inanimate)
Z {M, I, N} – Not fenimine (i.e., Masculine animate/inanimate or Neuter); only for (some) pronoun forms and certain numerals

Table C.4. Number

Value Description
D Dual , e.g. “nohama"
P Plural, e.g. “nohami”
S Singular, e.g. “noha”
W Singular for feminine gender, plural with neuter; can only appear in participle or nominal adjective form with gender value Q
X Any

Table C.5. Case

Value Description
1 Nominative, e.g. “žena”
2 Genitive, e.g. “ženy”
3 Dative, e.g. “ženě”
4 Accusative, e.g. “ženu”
5 Vocative, e.g. “ženo”
6 Locative, e.g. “ženě”
7 Instrumental, e.g. “ženou”
X Any

Table C.6. Possessive gender

Value Description
F Feminine, e.g. “matčin”, “její”
M Masculine animate (adjectives only), e.g. “otců”
X Any
Z {M, I, N} – Not feminine, e.g. “jeho”

Table C.7. Possessive number

Value Description
P Plural, e.g. “náš”
S Singular, e.g. “můj”
X Any, e.g. “your”

Table C.8. Person

Value Description
1 1st person, e.g. “píšu”, “píšeme”
2 2nd person, e.g. “píšeš”, “píšete”
3 3rd person, e.g. “píše”, “píšou”
X Any person

Table C.9. Tense

Value Description
F Future
H {R, P} – Past or Present
P Present
R Past
X Any

Table C.10. Grade

Value Description
1 Positive, e.g. “velký”
2 Comparative, e.g. “větší”
3 Superlative, e.g. “největší”

Table C.11. Negation

Value Description
A Affirmative (not negated), e.g. “možný”
N Negated, e.g. “nemožný”

Table C.12. Voice

Value Description
A Active, e.g. “píšící”
P Passive, e.g. “psaný”

Table C.13. Reserve 1

Value Description
- not applicable

Table C.14. Reserve 2

Value Description
- not applicable

Table C.15. Variant

Value Description
- Basic variant, standard contemporary style; also used for standard forms allowed for use in writing by the Czech Standard Orthography Rules despite being marked there as colloquial
1 Variant, second most used ( less frequent), still standard
2 Variant, rarely used, bookish, or archaic
3 Very archaic, also archaic + colloquial
4 Very archaic or bookish, but standard at the time
5 Colloquial, but (almost) tolerated even in public
6 Colloquial (standard in spoken Czech)
7 Colloquial (standard in spoken Czech), less frequent variant
8 Abbreviations
9 Special uses, e.g. personal pronouns after prepositions etc.