ACL Anthology Reference Corpus (ACL ARC)
[ Back
to the ACL home page ]
[ Back
to WING ]
This is the home page of the ACL Anthology Reference Corpus, a
corpus of scholarly publications about Computational Linguistics.
This corpus is a canonicalized subset of the ACL Anthology, up to
February 2007, consisting of 10,921 articles. We hope this frozen
corpus will be used for benchmarking applications for scholarly and
bibliometric data processing.
Download the corpus
- Version 20080325: This is the version described in the LREC paper that contains the canonical 10,921 computational linguistics papers as PDF and plain text files, with the associated metadata. (You can also email me to request a DVD copy of the corpus)
[ Complete tgz file from NUS ] [ Complete tgz file from Macquarie Univ. (courtesy Robert Dale) ] Warning, Huge! (4621149669 bytes, ~4.4 GB) Expect re-tries, use a client with resume capability
[ tgz file (without PDFs) ] (111080365 bytes, ~100MB)
Publications
Refereed:
- Steven Bird, Robert Dale, Bonnie Dorr, Bryan Gibson, Mark Joseph, Min-Yen Kan, Dongwon Lee, Brett Powley, Dragomir Radev and Yee Fan Tan (2008) The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics. In Proc. of Language Resources and Evaluation Conference (LREC 08). Marrakesh, Morocco, May.
[ .pdf pre-print ]
[ Slides (.htm) ]
Group Members
- Min-Yen Kan - Project leader, National University of Singapore
- Steven Bird, University of Melbourne
- Robert Dale, Macquarie University
- Bonnie Dorr, University of Maryland
- Bryan Gibson, University of Michigan
- Mark Joseph, University of Michigan
- Dongwon Lee, Pennsylvania State University
- Brett Powley, Macquarie University
- Dragomir Radev, University of Michigan
- Yee Fan Tan, National University of Singapore
Tools and Related Links
Here we list some related tools for bibliographic processing, and related sites for bibliographic research.
- ACL Anthology Network: A parallel initiative at the University of Michigan to construct a social network graph of researchers in computational linguistics.
- ACL Anthology: The current version of the ACL Anthology, from which the ACL ARC is derived from.
- ParsCit: A tool to automatically perform reference string parsing.
Acknowledgments
Our efforts have been supported by the grassroots initiative call
made by the ACL Exec at the ACL annual 2007 meeting in Prague. We
would like to acknowledge the support of the ACL Exec in encouraging
this form of collaboration.
Min-Yen Kan <kanmy@comp.nus.edu.sg>
Created on: Wed May 5 16:07:15 2004
| Version: 1.0
| Last modified:
Sat Mar 29 00:26:41 2008