Table C.1. Part of speech
Value | Description |
---|---|
A | Adjective |
C | Numeral |
D | Adverb |
I | Interjection |
J | Conjunction |
N | Noun |
P | Pronoun |
V | Verb |
R | Preposition |
T | Particle |
X | Unknown, Not Determined, Unclassifiable |
Z | Punctuation (also used for the Sentence Boundary token) |
Table C.2. Sub-part of speech
Value | Description | POS |
---|---|---|
# | Sentence boundary | Z – punctuation |
% | Author’s signature, e.g. haš-99_:B_;S | N – noun |
* | Word krát (lit.: “times”) | C – numeral |
, | Conjunction subordinate (incl. “aby”, “kdyby” in all forms) | J – conjuction |
} | Numeral, written using Roman numerals (XIV) | C – numeral |
: | Punctuation (except for the virtual sentence boundary word ###, which uses the C.2 #) | Z – punctuation |
= | Number written using digits | C – numeral |
? | Numeral “kolik” (lit. “how many”/“how much”) | C – numeral |
@ | Unrecognized word form | X – unknown |
^ | Conjunction (connecting main clauses, not subordinate) | J – conjunction |
4 | Relative/interrogative pronoun with adjectival declension of both types (soft and hard) (“jaký”, “který”, “čí”, ..., lit. “what”, “which”, “whose”, ...) | P – pronoun |
5 | The pronoun he in forms requested after any preposition (with prefix n-: “něj”, “něho”, ..., lit. “him” in various cases) | P – pronoun |
6 | Reflexive pronoun se in long forms (“sebe”, “sobě”, “sebou”, lit. “myself” / “yourself” / “herself” / “himself” in various cases; “se” is personless) | P – pronoun |
7 | Reflexive pronouns “se” (C.5 = 4), “si” (C.5 = 3), plus the same two forms with contracted -s: “ses”, “sis” (distinguished by C.8 = 2; also number is singular only) This should be done somehow more consistently, virtually any word can have this contracted -s (“cos”, “polívkus”, ...) | P – pronoun |
8 | Possessive reflexive pronoun “svůj” (lit. “my”/“your”/“her”/“his” when the possessor is the subject of the sentence) | P – pronoun |
9 | Relative pronoun “jenž”, “již”, ... after a preposition (n-: “něhož”, “niž”, ..., lit. “who”) | P – pronoun |
A | Adjective, general | A – adjective |
B | Verb, present or future form | V – verb |
C | Adjective, nominal (short, participial) form “rád”, “schopen”, ... | A – adjective |
D | Pronoun, demonstrative (“ten”, “onen”, ..., lit. “this”, “that”, “that”, ... “over there”, ... ) | P – pronoun |
E | Relative pronoun “což” (corresponding to English which in subordinate clauses referring to a part of the preceding text) | P – pronoun |
F | Preposition, part of; never appears isolated, always in a phrase (“nehledě (na)”, “vzhledem (k)”, ..., lit. “regardless”, “because of”) | R – preposition |
G | Adjective derived from present transgressive form of a verb | A – adjective |
H | Personal pronoun, clitical (short) form (“mě”, “mi”, “ti”, “mu”, ...); these forms are used in the second position in a clause (lit. “me”, “you”, “her”, “him”), even though some of them (“mě”) might be regularly used anywhere as well | P – pronoun |
I | Interjections | I – interjection |
J | Relative pronoun “jenž”, “již”, ... not after a preposition (lit. “who”, “whom”) | P – pronoun |
K | Relative/interrogative pronoun “kdo” (lit. “who”), incl. forms with affixes -ž and -s (affixes are distinguished by the category C.15 (for -ž) and C.8 (for -s)) | P – pronoun |
L | Pronoun, indefinite “všechen”, “sám” (lit. “all”, “alone”) | P – pronoun |
M | Adjective derived from verbal past transgressive form | A – adjective |
N | Noun (general) | N – noun |
O | Pronoun “svůj”, “nesvůj”, “tentam” alone (lit. “own self”, “not-in-mood”, “gone”) | P – pronoun |
P | Personal pronoun “já”, “ty”, “on” (lit. “I”, “you”, “he” ) (incl. forms with the enclitic -s, e.g. “tys”, lit. “you’re”); gender position is used for third person to distinguish “on”/“ona”/“ono” (lit. “he”/“she”/“it”), and number for all three persons | P – pronoun |
Q | Pronoun relative/interrogative “co”, “copak”, “cožpak” (lit. “what”, “isn’t-it-true-that”) | P – pronoun |
R | Preposition (general, without vocalization) | R – preposition |
S | Pronoun possessive “můj”, “tvůj”, “jeho” (lit. “my”, “your”, “his”); gender position used for third person to distinguish “jeho”, “její”, “jeho” (lit. “his”, “her”, “its”), and number for all three pronouns | P – pronoun |
T | Particle | T – particle |
U | Adjective possessive (with the masculine ending -ův as well as feminine -in) | A – adjective |
V | Preposition (with vocalization -e or -u): (“ve”, “pode”, “ku”, ..., lit. “in”, “under”, “to”) | R – preposition |
W | Pronoun negative (“nic”, “nikdo”, “nijaký”, “žádný”, ..., lit. “nothing”, “nobody”, “not-worth-mentioning”, “no”/“none”) | P – pronoun |
X | (temporary) Word form recognized, but tag is missing in dictionary due to delays in (asynchronous) dictionary creation | |
Y | Pronoun relative/interrogative co as an enclitic (after a preposition) (“oč”, “nač”, “zač”, lit. “about what”, “on”/“onto” “what”, “after”/“for what”) | P – pronoun |
Z | Pronoun indefinite (“nějaký”, “některý”, “číkoli”, “cosi”, ..., lit. “some”, “some”, “anybody’s”, “something”) | P – pronoun |
a | Numeral, indefinite (“mnoho”, “málo”, “tolik”, “několik”, “kdovíkolik”, ..., lit. “much”/“many”, “little”/“few”, “that much”/“many”, “some” (“number of”), “who-knows-how-much/many”) | C – numeral |
b | Adverb (without a possibility to form negation and degrees of comparison, e.g. “pozadu”, “naplocho”, ..., lit. “behind”, “flatly”); i.e. both the C.11 as well as the C.10 attributes in the same tag are marked by – (Not applicable) | D – adverb |
c | Conditional (of the verb “být” (lit. “to be”) only) (“by”, “bych”, “bys”, “bychom”, “byste”, lit. “would”) | V – verb |
d | Numeral, generic with adjectival declension (“dvojí”, “desaterý”, ..., lit. “two-kinds”/..., “ten-...”) | C – numeral |
e | Verb, transgressive present (endings -e/-ě, -íc, -íce) | V – verb |
f | Verb, infinitive | V – verb |
g | Adverb (forming negation (C.11 set to A/N) and degrees of comparison C.10 set to 1/2/3 (comparative/superlative), e.g. “velký”, “za\-jí\-ma\-vý”, ..., lit. “big”, “interesting” | |
h | Numeral, generic: only “jedny” and “nejedny” (lit. “one-kind”/“sort-of”, “not-only-one-kind”/“sort-of”) | C – numeral |
i | Verb, imperative form | V – verb |
j | Numeral, generic greater than or equal to 4 used as a syntactic noun (“čtvero”, “desatero”, ..., lit. “four-kinds”/“sorts-of”, “ten-...”) | C – numeral |
k | Numeral, generic greater than or equal to 4 used as a syntactic adjective, short form (“čtvery”, ..., lit. “four-kinds”/“sorts-of”) | C – numeral |
l | Numeral, cardinal “jeden”, “dva”, “tři”, “čtyři”, “půl”, ... (lit. “one”, “two”, “three”, “four”); also “sto” and “tisíc” (lit. “hundred”, “thousand”) if noun declension is not used | C – numeral |
m | Verb, past transgressive; also archaic present transgressive of perfective verbs (ex.: “udělav”, lit. “(he-)having-done”; arch. also “udělaje” (C.15 = 4), lit. “(he-)having-done)” | V – verb |
n | Numeral, cardinal greater than or equal to 5 | C – numeral |
o | Numeral, multiplicative indefinite (“-krát”, lit. (“times”): “mnohokrát”, “tolikrát”, ..., lit. “many times”, “that many times”) | C – numeral |
p | Verb, past participle, active (including forms with the enclitic - s, lit. ’re (“are”)) | V – verb |
q | Verb, past participle, active, with the enclitic -ť, lit. (“perhaps”) - “could-you-imagine-that?” or “but-because-” (both archaic) | V – verb |
r | Numeral, ordinal (adjective declension without degrees of comparison) | C – numeral |
s | Verb, past participle, passive (including forms with the enclitic -s, lit. ’re (“are”)) | V – verb |
t | Verb, present or future tense, with the enclitic -ť, lit. (“perhaps”) “-could-you-imagine-that?” or “but-because-” (both archaic) | V – verb |
u | Numeral, interrogative “kolikrát”, lit. “how many times?” | C – numeral |
v | Numeral, multiplicative, definite (-krát, lit. “times”: “pětkrát”, ..., lit. “five times”) | C – numeral |
w | Numeral, indefinite, adjectival declension (“nejeden”, “tolikátý”, ..., lit. “not-only-one”, “so-many-times-repeated”) | C – numeral |
y | Numeral, fraction ending at -ina; used as a noun (“pětina”, lit. “one-fifth”) | C – numeral |
z | Numeral, interrogative “kolikátý”, lit. “what” (“at-what-position-place-in-a-sequence”) | C – numeral |
Table C.3. Gender
Value | Description |
---|---|
F | Feminine |
H | {F, N} – Feminine or Neuter |
I | Masculine inanimate |
M | Masculine animate |
N | Neuter |
Q | Feminine (with singular only) or Neuter (with plural only); used only with participles and nominal forms of adjectives |
T | Masculine inanimate or Feminine (plural only); used only with participles and nominal forms of adjectives |
X | Any |
Y | {M, I} – Masculine (either animate or inanimate) |
Z | {M, I, N} – Not fenimine (i.e., Masculine animate/inanimate or Neuter); only for (some) pronoun forms and certain numerals |
Table C.4. Number
Value | Description |
---|---|
D | Dual , e.g. “nohama" |
P | Plural, e.g. “nohami” |
S | Singular, e.g. “noha” |
W | Singular for feminine gender, plural with neuter; can only appear in participle or nominal adjective form with gender value Q |
X | Any |
Table C.5. Case
Value | Description |
---|---|
1 | Nominative, e.g. “žena” |
2 | Genitive, e.g. “ženy” |
3 | Dative, e.g. “ženě” |
4 | Accusative, e.g. “ženu” |
5 | Vocative, e.g. “ženo” |
6 | Locative, e.g. “ženě” |
7 | Instrumental, e.g. “ženou” |
X | Any |
Table C.6. Possessive gender
Value | Description |
---|---|
F | Feminine, e.g. “matčin”, “její” |
M | Masculine animate (adjectives only), e.g. “otců” |
X | Any |
Z | {M, I, N} – Not feminine, e.g. “jeho” |
Table C.7. Possessive number
Value | Description |
---|---|
P | Plural, e.g. “náš” |
S | Singular, e.g. “můj” |
X | Any, e.g. “your” |
Table C.8. Person
Value | Description |
---|---|
1 | 1st person, e.g. “píšu”, “píšeme” |
2 | 2nd person, e.g. “píšeš”, “píšete” |
3 | 3rd person, e.g. “píše”, “píšou” |
X | Any person |
Table C.10. Grade
Value | Description |
---|---|
1 | Positive, e.g. “velký” |
2 | Comparative, e.g. “větší” |
3 | Superlative, e.g. “největší” |
Table C.11. Negation
Value | Description |
---|---|
A | Affirmative (not negated), e.g. “možný” |
N | Negated, e.g. “nemožný” |
Table C.15. Variant
Value | Description |
---|---|
- | Basic variant, standard contemporary style; also used for standard forms allowed for use in writing by the Czech Standard Orthography Rules despite being marked there as colloquial |
1 | Variant, second most used ( less frequent), still standard |
2 | Variant, rarely used, bookish, or archaic |
3 | Very archaic, also archaic + colloquial |
4 | Very archaic or bookish, but standard at the time |
5 | Colloquial, but (almost) tolerated even in public |
6 | Colloquial (standard in spoken Czech) |
7 | Colloquial (standard in spoken Czech), less frequent variant |
8 | Abbreviations |
9 | Special uses, e.g. personal pronouns after prepositions etc. |