PCEDT 1.0
Table of Contents
Documentation
Overview
Data
Czech-English Penn Treebank
Reader's Digest Parallel Corpus
Czech Monolingual Corpus
Dictionaries
Data Sizes
Tools
References
Definitions of data types
CSTS document type and csts.doctype
FS format description
Licensing information
PCEDT_license.html
Main README file
README
Structure of the CD
PCEDT_CD_1.0/
|-- data
| |-- PTB_corpus
| | |-- original [README]
| | | |-- En_development (259)
| | | |-- En_evaluation (256)
| | | `-- En_training (48693)
| | |-- raw [README]
| | | |-- Cz_development (259)
| | | |-- Cz_evaluation (256)
| | | |-- Cz_training (21113)
| | | |-- En_development (259)
| | | |-- En_evaluation (256)
| | | `-- En_training (48693)
| | |-- reference_translations [README]
| | | |-- En_development (4 x 259)
| | | `-- En_evaluation (4 x 256)
| | |-- NIST_format [README]
| | | |-- Cz_development (259)
| | | |-- Cz_evaluation (256)
| | | |-- Cz_training (21141)
| | | |-- En_development (259)
| | | |-- En_evaluation (256)
| | | `-- En_training (21141)
| | |-- automatic_tagged [README]
| | | |-- Cz_development (259)
| | | |-- Cz_evaluation (256)
| | | `-- Cz_training (21113)
| | |-- automatic_AR [README]
| | | |-- Cz_development (259 / 256)
| | | |-- Cz_evaluation (256 / 256)
| | | |-- Cz_training (21113 / 21022)
| | | |-- En_development (259)
| | | |-- En_evaluation (256)
| | | `-- En_training (48693)
| | |-- automatic_TR [README]
| | | |-- Cz_development (259 / 256)
| | | |-- Cz_evaluation (256 / 256)
| | | |-- Cz_training (21113 / 21022)
| | | |-- En_development (259)
| | | |-- En_evaluation (256)
| | | `-- En_training (48693)
| | `- manual_TR [README]
| | |-- Cz_development (233)
| | |-- Cz_evaluation (239)
| | |-- En_development (248)
| | |-- En_evaluation (249)
| | `-- En_training (760)
| |-- RD_corpus
| | `-- raw [README]
| | |-- Align (54091)
| | |-- Cz (59041)
| | `-- En (58656)
| |-- Czech_raw_texts [README]
| | `-- PureData (2,385,000)
| `-- Dictionaries [README]
| |-- CzechEnglishProbDict.txt (46150 pairs)
| |-- CzechEnglishFormsDict.txt (496673 pairs)
| `-- slovnik_data.txt (115929 pairs)
|-- doc
| |-- README
| |-- PCEDT_main.html (this file)
| |-- csts.html
| |-- fs.html
| `-- papers
|-- dtd
| |-- csts.doctype
| '-- mteval-v1.1.dtd
`-- tools [README]
|-- SMT_QuickRun
| |-- SMT_QuickRun1.2.tgz
| `-- Doc
| `-- SMT_QuickRun.html
|-- TrEd
| |-- tred-current.tar.gz
| |-- tred-dep-unix.tar.gz
| |-- tred_wininst_en.zip
| `-- Doc
| `-- TrEd.html
|-- NetGraph
| |-- netgraph_client_application_bin_1.68.zip
| |-- netgraph_server_linux_i386.zip
| `-- Doc
| |-- netgraph_manual.html
| `-- netgraph_server_install.html
`-- misc