This is release 1.0 of the Oncology corpus of PennBioIE, the Biomedical Information Extraction Project at the University of Pennsylvania, supported by award EIA-0205448 from the National Science Foundation's Information Technology Research program, with assistance in specific areas from The Pew Cardiac Trusts and the David Lawrence Altschuler Chair in Genomics and Computational Biology. This release also includes the v0.9 release of December 2004. The purpose of this project is to provide material for the development of better methods for information extraction from biomedical free text. To that end we have annotated PubMed abstracts in two biomedical domains: inhibition of the cytochrome P450 family of enzymes: name: CYP450 short name: cyp abstracts: 1100 approx.wds: 274,000 cancer, concentrating on molecular genetics: name: oncology short name: onco abstracts: 1414 approx.wds: 327,000 In addition, 642 abstracts (324 cyp, 318 onco) are also syntactically annotated (treebanked), and 601 abstracts (oncology only) have been annotated for relations between entities that are part of a single genetic variation. The texts are annotated at the following layers: - Paragraph - Sentence - Biomedical entity - Token and part of speech - Syntax (treebanking) (some texts only) - Semantic relations (some oncology texts only) The project and the Oncology corpus are described in more detail in index.html and in data/data.html.