Penn Parsed Corpora of Historical English (PPCHE) was developed at the University of Pennsylvania and consists of running texts and text samples of British English prose from the earliest Middle English documents (1100 CE) up to the First World War (1914 CE). PPCHE contains three corpora covering traditionally recognized periods of English: - Penn-Helsinki Parsed Corpus of Middle English, second edition (PPCME2) - Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME) - Penn Parsed Corpus of Modern British English, second edition (PPCMBE2) Each text comes in two forms: syntactically annotated (parsed) and part-of-speech tagged. The current release does not include unannotated versions of the texts. The annotations have been carefully reviewed over many years by expert human annotators for accuracy and consistency. (Please report remaining errors to beatrice DOT santorini AT gmail DOT com.) Each text also has an associated file with philological information. PPCHE was originally intended to aid research in the history of English, especially the historical syntax of the language. More recently, computational linguists have begun to exploit PPCHE's great range of stylistic and orthographic variation for research in domain adaptation. The 2025 release is a corrected, revised, and slightly augmented version of the 2020 release. The annotation guidelines have been streamlined across time periods and for consistency with other historical corpora using the same guidelines. Each of the three subcorpora has its own directory and should be cited individually as follows: Kroch, Anthony, and Ann Taylor. 2000-. Penn-Helsinki Parsed Corpus of Middle English, second edition (PPCME2), release 5. LDC2025XXXX. Web download file. Philadelphia, PA: Linguistic Data Consortium. Kroch, Anthony, Beatrice Santorini, and Lauren Delfs. 2004-. Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME), release 4. LDC2025XXXX. Web download file. Philadelphia, PA: Linguistic Data Consortium. Kroch, Anthony, Beatrice Santorini, and Ariel Diertani. 2016-. Penn-Helsinki Parsed Corpus of Modern British English, second edition (PPCMBE2), release 2. LDC2025XXXX. Web download file. Philadelphia, PA: Linguistic Data Consortium. The directory for each subcorpus in turn has two directories: data and docs. The data directory contains the directories with the parsed and POS-tagged files. The docs directory contains a description of each subcorpus and a philological_info_files directory with detailed philological information for each text. Finally, the release includes: - the annotation guidelines, and - the CorpusSearch 2 search program (which allows users to search the corpora for syntactic structures, word sequences and words), along with documentation Authors: Anthony Kroch, Beatrice Santorini, Ann Taylor, Ariel Diertani (Lauren Delfs) Languages: Middle English (1100-1500) (enm), 20.2% Early Modern English (1500-1700) (eng), 31.3% Modern British English (1700-1914) (eng) 48.3% Expected use of corpus: Linguistic research on historical English; domain adaptation for NLP Collection procedure: PPCHE is based in part on the Helsinki Corpus of English Texts. PPCME2 (ca. 1.2M words) includes most of the Middle English texts the Helsinki Corpus and adds some not included in that corpus. See the documentation for PPCME2 for details. PPCEME (over 1.7M words) includes all of the Early Modern English texts from the Helsinki Corpus as well as additional texts selected to give the same genre balance as the original Helsinki Corpus; the additional texts are twice the size of the original texts. PPCMBE2 (ca. 2.8M words) covers a later time period than that covered by the Helsinki Corpus, but the texts were selected to give the same genre balance as the Early Modern English part. Data: All data is encoded in UTF-8. The data files are presented as plain text, and all philological information as html. The parsed data are in Penn Treebank format.