LDC Catalog by Year
2024
2023
2022
2021
2020
2019
2018
2017
2016
2015
2014 2013 2012 2011 2010 2009 2008 2007 2006 2005
2004 2003 2002 2001 2000 1999 1998 1997 1996 1995
1994 1993
2014 2013 2012 2011 2010 2009 2008 2007 2006 2005
2004 2003 2002 2001 2000 1999 1998 1997 1996 1995
1994 1993
2024
LDC2024T11 | Abstract Meaning Representation 3.0 - Machine Translations | |
LDC2024T02 | AIDA Scenario 1 Practice Topic Annotation | |
LDC2024T06 | AIDA Scenario 2 Practice Topic Annotation | |
LDC2024T04 | AIDA Scenario 2 Practice Topic Source Data | |
LDC2024T05 | Automatic Content Extraction for Portuguese | |
LDC2024S04 | BabyEars Affective Vocalizations | |
LDC2024S05 | Call My Net 1 | |
LDC2024S08 | Dialogs Re-Enacted Across Languages | |
LDC2024S06 | Diaspora Tibetan Speech | |
LDC2024S01 | KASET - Kurmanji and Sorani Kurdish Speech and Transcripts | |
LDC2024S11 | L2-KSU Native and Non-Native Arabic Speech | |
LDC2024T03 | LoReHLT Hausa Representative Language Pack | |
LDC2024T01 | LORELEI Farsi Representative Language Pack | |
LDC2024T07 | LORELEI Uyghur Incident Language Pack | |
LDC2024T10 | LORELEI Yoruba Representative Language Pack | |
LDC2024S07 | MATERIAL Bulgarian-English Language Pack | |
LDC2024S13 | MATERIAL Farsi-English Language Pack | |
LDC2024S10 | MATERIAL Somali-English Language Pack | |
LDC2024T09 | MultiTACRED | |
LDC2024S03 | RATS Low Speech Density | |
LDC2024S09 | Ravnursson Faroese Speech and Transcripts | |
LDC2024T08 | RST Continuity Corpus | |
LDC2024S12 | Samrómur Synthetic | |
LDC2024S02 | Second Language University Speech Intelligibility Corpus |
2023
LDC2023V01 | 2019 NIST Speaker Recognition Evaluation Test Set -- Audio-Visual | |
LDC2023S03 | 2019 NIST Speaker Recognition Evaluation Test Set -- CTS Challenge | |
LDC2023S06 | 2019 OpenSAT Public Safety Communications Simulation | |
LDC2023T10 | AIDA Scenario 1 and 2 Reference Knowledge Base | |
LDC2023T11 | AIDA Scenario 1 Practice Topic Source Data | |
LDC2023S01 | AIDA Ukrainian Broadcast and Telephone Speech Audio and Transcripts | |
LDC2023S08 | CALLFRIEND Russian Speech | |
LDC2023T09 | CALLFRIEND Russian Text | |
LDC2023T04 | DEFT English Light and Rich ERE Annotation | |
LDC2023S10 | Kasdi-Merbah University Emotional Database in Arabic Speech | |
LDC2023S07 | LDC Spoken Language Sampler - Sixth Release | |
LDC2023T07 | LORELEI Indonesian Representative Language Pack | |
LDC2023T01 | LORELEI Swahili Representative Language Pack | |
LDC2023T02 | LORELEI Tagalog Representative Language Pack | |
LDC2023T03 | LORELEI Tamil Representative Language Pack | |
LDC2023T08 | LORELEI Thai Representative Language Pack | |
LDC2023T06 | LORELEI Zulu Representative Language Pack | |
LDC2023S02 | Mixer 3 Speech | |
LDC2023S04 | Mixer 7 Spanish Speech | |
LDC2023L01 | Moroccan Arabic - English Lexical Database | |
LDC2023T05 | Penn Korean Universal Dependency Treebank | |
LDC2023S09 | REMIX Telephone Collection | |
LDC2023S05 | Samrómur Queries Icelandic Speech 1.0 | |
LDC2023T13 | TAC KBP Belief and Sentiment - Comprehensive Training and Evaluation Data 2016-2017 |
2022
LDC2022S10 | 2017 NIST Language Recognition Evaluation Training and Development Sets | |
LDC2022S01 | 2017 NIST OpenSAT Pilot - SSSF | |
LDC2022T02 | AttImam | |
LDC2022T06 | BOLT English Translation Treebank - Egyptian Arabic SMS/Chat | |
LDC2022T07 | CAMIO Transcription Languages | |
LDC2022S13 | Global TIMIT Thai | |
LDC2022V01 | HAVIC MED Novel 1 Test -- Videos, Metadata and Annotation | |
LDC2022V02 | HAVIC MED Novel 2 Test -- Videos, Metadata and Annotation | |
LDC2022T05 | LORELEI Bengali Representative Language Pack | |
LDC2022T01 | LORELEI Kinyarwanda Incident Language Pack | |
LDC2022T03 | LORELEI Wolof Representative Language Pack | |
LDC2022S08 | MASRI Synthetic | |
LDC2022S04 | NUBUC | |
LDC2022T04 | Qatari Corpus of Argumentative Writing | |
LDC2022L01 | Rime-Cantonese: A Normalized Cantonese Jyutping Lexicon | |
LDC2022S11 | Samrómur Children Icelandic Speech 1.0 | |
LDC2022S05 | Samrómur Icelandic Speech 1.0 | |
LDC2022S06 | Second DIHARD Challenge Evaluation - Eleven Sources | |
LDC2022S07 | Second DIHARD Challenge Evaluation - SEEDLingS | |
LDC2022S03 | Spoken Digits in Hindi and Indian English | |
LDC2022S02 | The Child Subglottal Resonances Database | |
LDC2022S12 | Third DIHARD Challenge Development | |
LDC2022S14 | Third DIHARD Challenge Evaluation | |
LDC2022S09 | Xi'an Guanzhong Object Naming |
2021
LDC2021S01 | Althingi Parliamentary Speech | |
LDC2021T04 | ATIS - Seven Languages | |
LDC2021T07 | BOLT Chinese Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech | |
LDC2021T11 | BOLT Chinese SMS/Chat Parallel Training Data | |
LDC2021T14 | BOLT Egyptian Arabic Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech | |
LDC2021T18 | BOLT Egyptian Arabic PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech | |
LDC2021T15 | BOLT Egyptian Arabic SMS/Chat Parallel Training Data | |
LDC2021T12 | BOLT Egyptian Arabic Treebank - Conversational Telephone Speech | |
LDC2021T17 | BOLT Egyptian Arabic Treebank - SMS/Chat | |
LDC2021T19 | BOLT English Translation Treebank - Chinese SMS/Chat | |
LDC2021T03 | BOLT English Treebank - SMS/Chat | |
LDC2021T13 | Chinese Abstract Meaning Representation 2.0 | |
LDC2021L01 | Classical Arabic Dictionary | |
LDC2021S02 | Columbia Games Corpus | |
LDC2021T16 | DiscAlign for Penn and RST Discourse Treebanks | |
LDC2021T10 | ESPADA | |
LDC2021S06 | Ethnobotanical Research and Language Documentation of Nahuatl | |
LDC2021S03 | Global TIMIT Mandarin Chinese | |
LDC2021V01 | HAVIC MED Training Data -- Videos, Metadata and Annotation | |
LDC2021T02 | LORELEI Akan Representative Language Pack | |
LDC2021S05 | MyST Children's Conversational Speech | |
LDC2021T05 | Penn Discourse Treebank Version 2.0 - German Translation | |
LDC2021S08 | RATS Speaker Identification | |
LDC2021S10 | Second DIHARD Challenge Development - Eleven Sources | |
LDC2021S11 | Second DIHARD Challenge Development - SEEDLingS | |
LDC2021T08 | TAC KBP English Sentiment Slot Filling -- Comprehensive Training and Evaluation Data 2013-2014 | |
LDC2021T06 | TAC KBP English Surprise Slot Filling -- Comprehensive Training and Evaluation Data 2010 | |
LDC2021S04 | The SSNCE Database of Tamil Dysarthric Speech | |
LDC2021S09 | UCLA Speaker Variability Database | |
LDC2021S07 | Wikipedia Spanish Speech and Transcripts | |
LDC2021T09 | X-SRL: Parallel Cross-lingual Semantic Role Labeling |
2020
LDC2020S04 | 2018 NIST Speaker Recognition Evaluation Test Set | |
LDC2020T02 | Abstract Meaning Representation (AMR) Annotation Release 3.0 | |
LDC2020T07 | Abstract Meaning Representation 2.0 - Four Translations | |
LDC2020T15 | BOLT Chinese-English Word Alignment and Tagging -- Conversational Telephone Speech Training | |
LDC2020T05 | BOLT Egyptian Arabic-English Word Alignment -- Conversational Telephone Speech Training | |
LDC2020T20 | BOLT English Co-reference -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech | |
LDC2020T21 | BOLT English PropBank and Sense -- Discussion Forum, SMS/Chat, and Conversational Telephone Speech | |
LDC2020T09 | BOLT English Translation Treebank - Chinese Discussion Forum | |
LDC2020S08 | CALLFRIEND American English-Southern Dialect Second Edition | |
LDC2020S06 | CALLFRIEND Mandarin Chinese-Taiwan Dialect Second Edition | |
LDC2020T01 | Chinese CogBank | |
LDC2020L02 | Chinese Lexical Resources for Gender, Number, Animacy | |
LDC2020T23 | Corpus of Law, Academic, and News | |
LDC2020L01 | Database of Word Level Statistics - Mandarin | |
LDC2020T19 | DEFT Chinese Light and Rich ERE Annotation | |
LDC2020T06 | EVALution | |
LDC2020S11 | Global TIMIT Learner Simple English | |
LDC2020S09 | Global TIMIT Learner Treebank English | |
LDC2020S12 | Global TIMIT Mandarin Chinese-Guanzhong Dialect | |
LDC2020S02 | IARPA Babel Dholuo Language Pack IARPA-babel403b-v1.0b | |
LDC2020S07 | IARPA Babel Javanese Language Pack IARPA-babel402b-v1.0b | |
LDC2020S10 | IARPA Babel Mongolian Language Pack IARPA-babel401b-v2.0b | |
LDC2020S01 | LibriVox Spanish | |
LDC2020T10 | LORELEI Entity Detection and Linking Knowledge Base | |
LDC2020T11 | LORELEI Oromo Incident Language Pack | |
LDC2020T22 | LORELEI Tigrinya Incident Language Pack | |
LDC2020T24 | LORELEI Ukrainian Representative Language Pack | |
LDC2020T17 | LORELEI Vietnamese Representative Language Pack | |
LDC2020T04 | Machine Reading Phase 1 IC Training Data | |
LDC2020S03 | Mixer 4 and 5 Speech | |
LDC2020S05 | Multi-Language Conversational Telephone Speech 2011 -- Mandarin Chinese | |
LDC2020T16 | Penn Parsed Corpora of Historical English | |
LDC2020S13 | Phonemes of Arabic | |
LDC2020T12 | SemTransCNC | |
LDC2020T14 | Speech Sentiment Annotations | |
LDC2020T03 | TAC KBP English Event Argument - Training and Evaluation Data 2014-2015 | |
LDC2020T13 | TAC KBP English Event Nugget Detection and Coreference - Comprehensive Training and Evaluation Data 2014-2015 | |
LDC2020T08 | TAC KBP English Temporal Slot Filling - Comprehensive Training and Evaluation Data 2011 and 2013 | |
LDC2020T18 | TAC KBP Event Argument - Comprehensive Training and Evaluation Data 2016-2017 |
2019
LDC2019S20 | 2016 NIST Speaker Recognition Evaluation Test Set | |
LDC2019T01 | BOLT Arabic Discussion Forum Parallel Training Data | |
LDC2019T13 | BOLT Chinese-English Word Alignment and Tagging -- SMS/Chat Training | |
LDC2019T18 | BOLT Egyptian Arabic-English Word Alignment -- SMS/Chat Training | |
LDC2019T06 | BOLT Egyptian-English Word Alignment -- Discussion Forum Training | |
LDC2019T15 | BOLT English Treebank - Discussion Forum | |
LDC2019S21 | CALLFRIEND American English-Non-Southern Dialect Second Edition | |
LDC2019S18 | CALLFRIEND Canadian French Second Edition | |
LDC2019S04 | CALLFRIEND Egyptian Arabic Second Edition | |
LDC2019T07 | Chinese Abstract Meaning Representation 1.0 | |
LDC2019S07 | CIEMPIESS Experimentation | |
LDC2019T11 | Corpus of Conversational Persian Transcripts | |
LDC2019T03 | DEFT Chinese Committed Belief Annotation | |
LDC2019T16 | DEFT English Committed Belief Annotation | |
LDC2019T09 | DEFT Spanish Committed Belief Annotation | |
LDC2019S09 | First DIHARD Challenge Development - Eight Sources | |
LDC2019S10 | First DIHARD Challenge Development - SEEDLingS | |
LDC2019S12 | First DIHARD Challenge Evaluation - Nine Sources | |
LDC2019S13 | First DIHARD Challenge Evaluation - SEEDLingS | |
LDC2019V01 | HAVIC MED Progress Test -- Videos, Metadata and Annotation | |
LDC2019S22 | IARPA Babel Amharic Language Pack IARPA-babel307b-v1.0b | |
LDC2019S08 | IARPA Babel Guarani Language Pack IARPA-babel305b-v1.0c | |
LDC2019S16 | IARPA Babel Igbo Language Pack IARPA-babel306b-v2.0c | |
LDC2019S03 | IARPA Babel Lithuanian Language Pack IARPA-babel304b-v1.0b | |
LDC2019S17 | LDC Spoken Language Sampler - Fifth Release | |
LDC2019T14 | Machine Reading Phase 1 NFL Scoring Training Data | |
LDC2019S23 | Magic Data Chinese Mandarin Conversational Speech | |
LDC2019S02 | Multi-Language Conversational Telephone Speech 2011 -- Arabic Group | |
LDC2019S15 | Multi-Language Conversational Telephone Speech 2011 -- East Asian | |
LDC2019S06 | Multi-Language Conversational Telephone Speech 2011 -- English Group | |
LDC2019T04 | Multilingual ATIS | |
LDC2019T05 | Penn Discourse Treebank Version 3.0 | |
LDC2019T10 | Phrase Detectives Corpus Version 2 | |
LDC2019S19 | Polish Speech Database | |
LDC2019S01 | SRI Speech-Based Collaborative Learning Corpus | |
LDC2019T08 | TAC KBP Chinese Regular Slot Filling - Comprehensive Training and Evaluation Data 2014 | |
LDC2019T17 | TAC KBP Cold Start - Comprehensive Evaluation Data 2012-2017 | |
LDC2019T19 | TAC KBP Entity Discovery and Linking - Comprehensive Evaluation Data 2016-2017 | |
LDC2019T02 | TAC KBP Entity Discovery and Linking - Comprehensive Training and Evaluation Data 2014-2015 | |
LDC2019T12 | TAC KBP Evaluation Source Corpora 2016-2017 | |
LDC2019S14 | The DKU-JNU-EMA Electromagnetic Articulography Database | |
LDC2019S11 | USC-SFI MALACH Interviews and Transcripts English – Speech Recognition Edition | |
LDC2019S05 | VAST Chinese Speech and Transcripts |
2018
LDC2018T08 | 2007 CoNLL Shared Task - Arabic & English | |
LDC2018T06 | 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish | |
LDC2018T07 | 2007 CoNLL Shared Task - Greek, Hungarian & Italian | |
LDC2018S06 | 2011 NIST Language Recognition Evaluation Test Set | |
LDC2018S14 | AISHELL-1 | |
LDC2018S15 | Avatar Education Portuguese | |
LDC2018T10 | BOLT Arabic Discussion Forums | |
LDC2018T15 | BOLT Chinese SMS/Chat | |
LDC2018T23 | BOLT Egyptian Arabic Treebank - Discussion Forum | |
LDC2018T19 | BOLT English SMS/Chat | |
LDC2018T18 | BOLT Information Retrieval Comprehensive Training and Evaluation | |
LDC2018S09 | CALLFRIEND Mandarin Chinese-Mainland Dialect Second Edition | |
LDC2018S11 | CIEMPIESS Balance | |
LDC2018T20 | Concretely Annotated English Gigaword | |
LDC2018T01 | DEFT Spanish Treebank | |
LDC2018S01 | DIRHA English WSJ Audio | |
LDC2018S05 | GALE Phase 4 Arabic Broadcast News Speech | |
LDC2018T14 | GALE Phase 4 Arabic Broadcast News Transcripts | |
LDC2018T05 | H2, E2, ERK1 Children's Writing | |
LDC2018V01 | HAVIC MED Event E051-E060 -- Videos, Metadata and Annotation | |
LDC2018S18 | HUB5 Mandarin Telephone Speech and Transcripts Second Edition | |
LDC2018S07 | IARPA Babel Cebuano Language Pack IARPA-babel301b-v2.0b | |
LDC2018S13 | IARPA Babel Kazakh Language Pack IARPA-babel302b-v1.0a | |
LDC2018S16 | IARPA Babel Telugu Language Pack IARPA-babel303b-v1.0a | |
LDC2018S02 | IARPA Babel Tok Pisin Language Pack IARPA-babel207b-v1.0e | |
LDC2018T04 | LORELEI Amharic Representative Language Pack - Monolingual and Parallel Text | |
LDC2018T11 | LORELEI Somali Representative Language Pack - Monolingual and Parallel Text | |
LDC2018S03 | Multi-Language Conversational Telephone Speech 2011 -- Central Asian | |
LDC2018S08 | Multi-Language Conversational Telephone Speech 2011 -- Central European | |
LDC2018S12 | Multi-Language Conversational Telephone Speech 2011 -- Spanish | |
LDC2018S17 | Nautilus Speaker Characterization | |
LDC2018S10 | RATS Language Identification | |
LDC2018S04 | Rhythm and Pitch | |
LDC2018T09 | SPADE | |
LDC2018T03 | TAC KBP Comprehensive English Source Corpora 2009-2014 | |
LDC2018T16 | TAC KBP English Entity Linking - Comprehensive Training and Evaluation Data 2009-2013 | |
LDC2018T22 | TAC KBP English Regular Slot Filling - Comprehensive Training and Evaluation Data 2009-2014 | |
LDC2018T24 | TAC Relation Extraction Dataset | |
LDC2018T13 | TRAD Arabic-French Parallel Text -- Newsgroup | |
LDC2018T21 | TRAD Arabic-French Parallel Text -- Newswire | |
LDC2018T02 | TRAD Chinese-French Parallel Text -- Blog | |
LDC2018T17 | TRAD Chinese-French Parallel Text -- Broadcast News |
2017
LDC2017S06 | 2010 NIST Speaker Recognition Evaluation Test Set | |
LDC2017T13 | 2015-2016 CoNLL Shared Task | |
LDC2017T10 | Abstract Meaning Representation (AMR) Annotation Release 2.0 | |
LDC2017T14 | Ancient Chinese Corpus | |
LDC2017L01 | Arabic Speech Recognition Pronunciation Dictionary | |
LDC2017S21 | ASpIRE Development and Development Test Sets | |
LDC2017T05 | BOLT Chinese Discussion Forum Parallel Training Data | |
LDC2017T07 | BOLT Egyptian Arabic SMS/Chat and Transliteration | |
LDC2017T11 | BOLT English Discussion Forums | |
LDC2017S07 | CHiME2 Grid | |
LDC2017S10 | CHiME2 WSJ0 | |
LDC2017S24 | CHiME3 | |
LDC2017S23 | CIEMPIESS Light | |
LDC2017T15 | English Web Treebank Propbank | |
LDC2017T03 | First-Year Law Students' Court Memoranda | |
LDC2017T06 | GALE English-Chinese Parallel Aligned Treebank -- Training | |
LDC2017T02 | GALE Phase 3 and 4 Chinese Web Parallel Text | |
LDC2017S02 | GALE Phase 3 Arabic Broadcast News Speech Part 2 | |
LDC2017T04 | GALE Phase 3 Arabic Broadcast News Transcripts Part 2 | |
LDC2017S15 | GALE Phase 4 Arabic Broadcast Conversation Speech | |
LDC2017T12 | GALE Phase 4 Arabic Broadcast Conversation Transcripts | |
LDC2017S25 | GALE Phase 4 Chinese Broadcast News Speech | |
LDC2017T18 | GALE Phase 4 Chinese Broadcast News Transcripts | |
LDC2017S03 | IARPA Babel Haitian Creole Language Pack IARPA-babel201b-v0.2b | |
LDC2017S22 | IARPA Babel Kurmanji Kurdish Language Pack IARPA-babel205b-v1.0a | |
LDC2017S08 | IARPA Babel Lao Language Pack IARPA-babel203b-v3.1a | |
LDC2017S05 | IARPA Babel Swahili Language Pack IARPA-babel202b-v1.0d | |
LDC2017S13 | IARPA Babel Tamil Language Pack IARPA-babel204b-v1.1b | |
LDC2017S01 | IARPA Babel Vietnamese Language Pack IARPA-babel107b-v0.7 | |
LDC2017S19 | IARPA Babel Zulu Language Pack IARPA-babel206b-v0.1e | |
LDC2017S12 | KSUEmotions | |
LDC2017S16 | LDC Spoken Language Sampler - Fourth Release | |
LDC2017S11 | Metalogue Multi-Issue Bargaining Dialogue | |
LDC2017S14 | Multi-Language Conversational Telephone Speech 2011 -- South Asian | |
LDC2017S09 | Multi-Language Conversational Telephone Speech 2011 -- Turkish | |
LDC2017T01 | MWE-Aware English Dependency Corpus | |
LDC2017T16 | MWE-Aware English Dependency Corpus 2.0 | |
LDC2017S04 | Noisy TIMIT Speech | |
LDC2017T08 | Phrase Detectives Corpus | |
LDC2017S20 | RATS Keyword Spotting | |
LDC2017S18 | SRI-FRTIV | |
LDC2017T17 | TAC KBP Chinese Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2011-2014 | |
LDC2017T09 | The EventStatus Corpus | |
LDC2017V01 | UCLA High-Speed Laryngeal Video and Audio | |
LDC2017S17 | Vehicle City Voices Corpus – Part I |
2016
LDC2016T02 | Arabic Treebank - Weblog | |
LDC2016T18 | ARL Arabic Dependency Treebank | |
LDC2016L01 | Bamanankan Lexicon | |
LDC2016T05 | BOLT Chinese Discussion Forums | |
LDC2016T19 | BOLT Chinese-English Word Alignment and Tagging -- Discussion Forum Training | |
LDC2016T13 | Chinese Treebank 9.0 | |
LDC2016T22 | Chinese-English Parallel Sentences Extracted from Patents | |
LDC2016S04 | CHM150 | |
LDC2016T07 | DEFT Narrative Text | |
LDC2016S05 | Digital Archive of Southern Speech - NLP Version | |
LDC2016T16 | English Speed Networking Conversational Transcripts | |
LDC2016T08 | GALE Phase 3 and 4 Arabic Web Parallel Text | |
LDC2016T09 | GALE Phase 3 and 4 Chinese Broadcast Conversation Parallel Text | |
LDC2016T15 | GALE Phase 3 and 4 Chinese Broadcast News Parallel Text | |
LDC2016T25 | GALE Phase 3 and 4 Chinese Newswire Parallel Text | |
LDC2016S01 | GALE Phase 3 Arabic Broadcast Conversation Speech Part 2 | |
LDC2016T06 | GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 2 | |
LDC2016S07 | GALE Phase 3 Arabic Broadcast News Speech Part 1 | |
LDC2016T17 | GALE Phase 3 Arabic Broadcast News Transcripts Part 1 | |
LDC2016T11 | GALE Phase 4 Arabic Broadcast Conversation Parallel Sentences | |
LDC2016T20 | GALE Phase 4 Arabic Broadcast News Parallel Sentences | |
LDC2016T27 | GALE Phase 4 Arabic Newswire Parallel Sentences | |
LDC2016T14 | GALE Phase 4 Arabic Weblog Parallel Sentences | |
LDC2016S03 | GALE Phase 4 Chinese Broadcast Conversation Speech | |
LDC2016T12 | GALE Phase 4 Chinese Broadcast Conversation Transcripts | |
LDC2016T04 | GALE Phase 4 Chinese Weblog Parallel Sentences | |
LDC2016T01 | H1 Children's Writing | |
LDC2016V01 | HAVIC Pilot Transcription | |
LDC2016S06 | IARPA Babel Assamese Language Pack IARPA-babel102b-v0.5a | |
LDC2016S08 | IARPA Babel Bengali Language Pack IARPA-babel103b-v0.4b | |
LDC2016S02 | IARPA Babel Cantonese Language Pack IARPA-babel101b-v0.4c | |
LDC2016S12 | IARPA Babel Georgian Language Pack IARPA-babel404b-v1.0a | |
LDC2016S09 | IARPA Babel Pashto Language Pack IARPA-babel104b-v0.4bY | |
LDC2016S13 | IARPA Babel Tagalog Language Pack IARPA-babel106-v0.2g | |
LDC2016S10 | IARPA Babel Turkish Language Pack IARPA-babel105b-v0.5 | |
LDC2016T24 | JANA: A Human-Human Dialogues Corpus for Egyptian Dialect | |
LDC2016T21 | KAFD: Arabic Font Database | |
LDC2016S11 | Multi-Language Conversational Telephone Speech 2011 -- Slavic Group | |
LDC2016T03 | NewSoMe Corpus of Opinion in Blogs | |
LDC2016T23 | Richer Event Description | |
LDC2016T10 | SDP 2014 & 2015: Broad Coverage Semantic Dependency Parsing | |
LDC2016T26 | TAC KBP Spanish Cross-lingual Entity Linking - Comprehensive Training and Evaluation Data 2012-2014 |
2015
LDC2015T12 | 2006 CoNLL Shared Task - Arabic & Czech | |
LDC2015T11 | 2006 CoNLL Shared Task - Ten Languages | |
LDC2015T20 | ACE 2007 Spanish DevTest - Pilot Evaluation | |
LDC2015S10 | Arabic Learner Corpus | |
LDC2015S12 | Articulation Index LSCP | |
LDC2015T03 | Avocado Research Email Collection | |
LDC2015S07 | CIEMPIESS | |
LDC2015T08 | Coordination Annotation for the Penn Treebank | |
LDC2015T13 | English News Text Treebank: Penn Treebank Revised | |
LDC2015T06 | GALE Chinese-English Parallel Aligned Treebank -- Training | |
LDC2015T04 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3 | |
LDC2015T18 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 4 | |
LDC2015S01 | GALE Phase 2 Arabic Broadcast News Speech Part 2 | |
LDC2015T01 | GALE Phase 2 Arabic Broadcast News Transcripts Part 2 | |
LDC2015T05 | GALE Phase 3 and 4 Arabic Broadcast Conversation Parallel Text | |
LDC2015T07 | GALE Phase 3 and 4 Arabic Broadcast News Parallel Text | |
LDC2015T19 | GALE Phase 3 and 4 Arabic Newswire Parallel Text | |
LDC2015S11 | GALE Phase 3 Arabic Broadcast Conversation Speech Part 1 | |
LDC2015T16 | GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 1 | |
LDC2015S06 | GALE Phase 3 Chinese Broadcast Conversation Speech Part 2 | |
LDC2015T09 | GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2 | |
LDC2015S13 | GALE Phase 3 Chinese Broadcast News Speech | |
LDC2015T25 | GALE Phase 3 Chinese Broadcast News Transcripts | |
LDC2015T14 | GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences | |
LDC2015T21 | GALE Phase 4 Chinese Broadcast News Parallel Sentences | |
LDC2015T24 | GALE Phase 4 Chinese Newswire Parallel Sentences | |
LDC2015T22 | Karlsruhe Children's Text | |
LDC2015T23 | KHATT: Handwritten Arabic Text | |
LDC2015S09 | LDC Spoken Language Sampler - Third Release | |
LDC2015S05 | Mandarin Chinese Phonetic Segmentation and Tone | |
LDC2015S04 | Mandarin-English Code-Switching in South-East Asia | |
LDC2015T17 | NewSoMe Corpus of Opinion in News Reports | |
LDC2015S02 | RATS Speech Activity Detection | |
LDC2015T10 | RST Signalling Corpus | |
LDC2015T02 | SenSem Databank | |
LDC2015L01 | SenSem Lexicons | |
LDC2015S03 | The Subglottal Resonances Database | |
LDC2015S08 | The Walking Around Corpus | |
LDC2015T15 | TS Wikipedia |
2014
LDC2014S06 | 2009 NIST Language Recognition Evaluation Test Set | |
LDC2014T12 | Abstract Meaning Representation (AMR) Annotation Release 1.0 | |
LDC2014T18 | ACE 2007 Multilingual Training Corpus | |
LDC2014T24 | Boulder Lies and Truth | |
LDC2014S01 | CALLFRIEND Farsi Second Edition Speech | |
LDC2014T01 | CALLFRIEND Farsi Second Edition Transcripts | |
LDC2014T21 | Chinese Discourse Treebank 0.5 | |
LDC2014T07 | Domain-Specific Hyponym Relations | |
LDC2014T06 | ETS Corpus of Non-Native Written English | |
LDC2014T23 | Fisher and CALLHOME Spanish--English Speech Translation | |
LDC2014T03 | GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 2 | |
LDC2014T08 | GALE Arabic-English Parallel Aligned Treebank -- Web Training | |
LDC2014T19 | GALE Arabic-English Word Alignment -- Broadcast Training Part 1 | |
LDC2014T22 | GALE Arabic-English Word Alignment -- Broadcast Training Part 2 | |
LDC2014T05 | GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web | |
LDC2014T10 | GALE Arabic-English Word Alignment Training Part 2 -- Newswire | |
LDC2014T14 | GALE Arabic-English Word Alignment Training Part 3 -- Web | |
LDC2014T25 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2 | |
LDC2014S07 | GALE Phase 2 Arabic Broadcast News Speech Part 1 | |
LDC2014T17 | GALE Phase 2 Arabic Broadcast News Transcripts Part 1 | |
LDC2014T04 | GALE Phase 2 Chinese Broadcast News Parallel Text Part 1 | |
LDC2014T11 | GALE Phase 2 Chinese Broadcast News Parallel Text Part 2 | |
LDC2014T15 | GALE Phase 2 Chinese Newswire Parallel Text Part 1 | |
LDC2014T20 | GALE Phase 2 Chinese Newswire Parallel Text Part 2 | |
LDC2014T26 | GALE Phase 2 Chinese Web Parallel Text | |
LDC2014S09 | GALE Phase 3 Chinese Broadcast Conversation Speech Part 1 | |
LDC2014T28 | GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1 | |
LDC2014S05 | Hispanic-English Database | |
LDC2014T09 | HyTER Networks of Selected OpenMT08/09 Sentences | |
LDC2014S02 | King Saud University Arabic Speech Database | |
LDC2014T13 | MADCAT Chinese Pilot Training Set | |
LDC2014S03 | Multi-Channel WSJ Audio | |
LDC2014T02 | NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source | |
LDC2014T16 | TAC KBP Reference Knowledge Base | |
LDC2014S08 | United Nations Proceedings Speech | |
LDC2014S04 | USC-SFI MALACH Interviews and Transcripts Czech |
2013
LDC2013T06 | 1993-2007 United Nations Parallel Text | |
LDC2013T13 | Chinese Proposition Bank 3.0 | |
LDC2013T21 | Chinese Treebank 8.0 | |
LDC2013T02 | Chinese-English Biology and Chemistry Abstract Parallel Text | |
LDC2013S09 | CSC Deceptive Speech | |
LDC2013T14 | GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 1 | |
LDC2013T10 | GALE Arabic-English Parallel Aligned Treebank -- Newswire | |
LDC2013T23 | GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1 | |
LDC2013T05 | GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web | |
LDC2013S02 | GALE Phase 2 Arabic Broadcast Conversation Speech Part 1 | |
LDC2013S07 | GALE Phase 2 Arabic Broadcast Conversation Speech Part 2 | |
LDC2013T04 | GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1 | |
LDC2013T17 | GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2 | |
LDC2013T01 | GALE Phase 2 Arabic Web Parallel Text | |
LDC2013T11 | GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 1 | |
LDC2013T16 | GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 2 | |
LDC2013S04 | GALE Phase 2 Chinese Broadcast Conversation Speech | |
LDC2013T08 | GALE Phase 2 Chinese Broadcast Conversation Transcripts | |
LDC2013S08 | GALE Phase 2 Chinese Broadcast News Speech | |
LDC2013T20 | GALE Phase 2 Chinese Broadcast News Transcripts | |
LDC2013S05 | Greybeard | |
LDC2013S06 | LDC Spoken Language Sampler - Second Release | |
LDC2013T09 | MADCAT Phase 2 Training Set | |
LDC2013T15 | MADCAT Phase 3 Training Set | |
LDC2013L01 | Maninkakan Lexicon | |
LDC2013T12 | Manually Annotated Sub-Corpus Third Release | |
LDC2013S03 | Mixer 6 Speech | |
LDC2013T07 | NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets | |
LDC2013T03 | NIST 2012 Open Machine Translation (OpenMT) Evaluation | |
LDC2013T19 | OntoNotes Release 5.0 | |
LDC2013T18 | Semantic Textual Similarity (STS) 2013 Machine Translation | |
LDC2013T22 | The ARRAU Corpus of Anaphoric Information |
2012
LDC2012V01 | 2005 NIST/USF Evaluation Resources for the VACE Program - Broadcast News | |
LDC2012S01 | 2006 NIST Speaker Recognition Evaluation Test Set Part 2 | |
LDC2012T03 | 2009 CoNLL Shared Task Part 1 | |
LDC2012T04 | 2009 CoNLL Shared Task Part 2 | |
LDC2012T11 | American English Nickname Collection | |
LDC2012T21 | Annotated English Gigaword | |
LDC2012T07 | Arabic Treebank - Broadcast News v1.0 | |
LDC2012T09 | Arabic-Dialect/English Parallel Text | |
LDC2012T10 | Catalan TimeBank 1.0 | |
LDC2012T05 | Chinese Dependency Treebank 1.0 | |
LDC2012T22 | Chinese-English Semiconductor Parallel Text | |
LDC2012S03 | Digital Archive of Southern Speech | |
LDC2012T02 | English Translation Treebank: An-Nahar Newswire | |
LDC2012T13 | English Web Treebank | |
LDC2012T16 | GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web | |
LDC2012T20 | GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire | |
LDC2012T24 | GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web | |
LDC2012T06 | GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1 | |
LDC2012T14 | GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2 | |
LDC2012T18 | GALE Phase 2 Arabic Broadcast News Parallel Text | |
LDC2012T17 | GALE Phase 2 Arabic Newswire Parallel Text | |
LDC2012T15 | MADCAT Phase 1 Training Set | |
LDC2012S04 | Malto Speech and Transcripts | |
LDC2012T01 | ModeS TimeBank 1.0 | |
LDC2012T08 | Prague Czech-English Dependency Treebank 2.0 | |
LDC2012T23 | Russian-English Computer Security Parallel Text | |
LDC2012T12 | Spanish TimeBank 1.0 | |
LDC2012S02 | TORGO Database of Dysarthric Articulation | |
LDC2012S06 | Turkish Broadcast News Speech and Transcripts | |
LDC2012S05 | USC-SFI MALACH Interviews and Transcripts English |
2011
LDC2011S04 | 2005 NIST Speaker Recognition Evaluation Test Data | |
LDC2011S01 | 2005 NIST Speaker Recognition Evaluation Training Data | |
LDC2011S06 | 2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set | |
LDC2011S10 | 2006 NIST Speaker Recognition Evaluation Test Set Part 1 | |
LDC2011S09 | 2006 NIST Speaker Recognition Evaluation Training Set | |
LDC2011S02 | 2006 NIST Spoken Term Detection Development Set | |
LDC2011S03 | 2006 NIST Spoken Term Detection Evaluation Set | |
LDC2011V05 | 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1 | |
LDC2011V06 | 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2 | |
LDC2011S11 | 2008 NIST Speaker Recognition Evaluation Supplemental Set | |
LDC2011S08 | 2008 NIST Speaker Recognition Evaluation Test Set | |
LDC2011S05 | 2008 NIST Speaker Recognition Evaluation Training Set Part 1 | |
LDC2011S07 | 2008 NIST Speaker Recognition Evaluation Training Set Part 2 | |
LDC2011T05 | 2008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set | |
LDC2011T02 | ACE 2005 English SpatialML Annotations Version 2 | |
LDC2011T11 | Arabic Gigaword Fifth Edition | |
LDC2011T09 | Arabic Treebank: Part 2 v 3.1 | |
LDC2011T06 | Broadcast News Lattices | |
LDC2011T13 | Chinese Gigaword Fifth Edition | |
LDC2011T08 | Datasets for Generic Relation Extraction (reACE) | |
LDC2011T07 | English Gigaword Fifth Edition | |
LDC2011T10 | French Gigaword Third Edition | |
LDC2011T04 | Indian Language Part-of-Speech Tagset: Sanskrit | |
LDC2011V03 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1 | |
LDC2011V04 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2 | |
LDC2011V01 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 1 | |
LDC2011V02 | NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 2 | |
LDC2011T03 | OntoNotes Release 4.0 | |
LDC2011T01 | SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in Multiple Languages | |
LDC2011T12 | Spanish Gigaword Third Edition |
2010
LDC2010S03 | 2003 NIST Speaker Recognition Evaluation | |
LDC2010T09 | ACE 2005 Mandarin SpatialML Annotations | |
LDC2010T18 | ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0 | |
LDC2010T13 | Arabic Treebank: Part 1 v 4.1 | |
LDC2010T08 | Arabic Treebank: Part 3 v 3.2 | |
LDC2010S05 | Asian Elephant Vocalizations | |
LDC2010S07 | Asian Spoken Language Sampler | |
LDC2010T07 | Chinese Treebank 7.0 | |
LDC2010T06 | Chinese Web 5-gram Version 1 | |
LDC2010T02 | Czech Broadcast News MDE Transcripts | |
LDC2010T04 | Fisher Spanish - Transcripts | |
LDC2010S01 | Fisher Spanish Speech | |
LDC2010T03 | GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2 | |
LDC2010T16 | Indian Language Part-of-Speech Tagset: Bengali | |
LDC2010T24 | Indian Language Part-of-Speech Tagset: Hindi | |
LDC2010T19 | Korean Newswire Second Edition | |
LDC2010L01 | LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1 | |
LDC2010T22 | Manually Annotated Sub-Corpus First Release | |
LDC2010T15 | Message Understanding Conference 7 Timed (MUC7_T) | |
LDC2010T10 | NIST 2002 Open Machine Translation (OpenMT) Evaluation | |
LDC2010T11 | NIST 2003 Open Machine Translation (OpenMT) Evaluation | |
LDC2010T12 | NIST 2004 Open Machine Translation (OpenMT) Evaluation | |
LDC2010T14 | NIST 2005 Open Machine Translation (OpenMT) Evaluation | |
LDC2010T17 | NIST 2006 Open Machine Translation (OpenMT) Evaluation | |
LDC2010T21 | NIST 2008 Open Machine Translation (OpenMT) Evaluation | |
LDC2010T23 | NIST 2009 Open Machine Translation (OpenMT) Evaluation | |
LDC2010T01 | NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations | |
LDC2010T05 | NPS Internet Chatroom Conversations, Release 1.0 | |
LDC2010V01 | TRECVID 2004 Keyframes & Transcripts | |
LDC2010V02 | TRECVID 2006 Keyframes | |
LDC2010S02 | WTIMIT 1.0 |
2009
LDC2009S05 | 2007 NIST Language Recognition Evaluation Supplemental Training Set | |
LDC2009S04 | 2007 NIST Language Recognition Evaluation Test Set | |
LDC2009T12 | 2008 CoNLL Shared Task Data | |
LDC2009T05 | 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data | |
LDC2009T29 | ACL Anthology Reference Corpus | |
LDC2009L01 | An English Dictionary of the Tamil Verb Second Edition | |
LDC2009T30 | Arabic Gigaword Fourth Edition | |
LDC2009T22 | Arabic Newswire English Translation Collection | |
LDC2009V01 | Audiovisual Database of Spoken American English | |
LDC2009T04 | BioProp Version 1.0 | |
LDC2009T27 | Chinese Gigaword Fourth Edition | |
LDC2009S01 | CSLU: Numbers Version 1.3 | |
LDC2009S03 | CSLU: S4X Release 1.2 | |
LDC2009T20 | Czech Broadcast Conversation MDE Transcripts | |
LDC2009S02 | Czech Broadcast Conversation Speech | |
LDC2009T01 | English CTS Treebank with Structural Metadata | |
LDC2009T13 | English Gigaword Fourth Edition | |
LDC2009T23 | FactBank 1.0 | |
LDC2009T28 | French Gigaword Second Edition | |
LDC2009T03 | GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1 | |
LDC2009T09 | GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2 | |
LDC2009T02 | GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1 | |
LDC2009T06 | GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2 | |
LDC2009T15 | GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1 | |
LDC2009T08 | Japanese Web N-gram Version 1 | |
LDC2009T10 | Language Understanding Annotation Corpus | |
LDC2009T26 | NXT Switchboard Annotations | |
LDC2009T24 | OntoNotes Release 3.0 | |
LDC2009T11 | REFLEX Entity Translation Training/DevTest | |
LDC2009T21 | Spanish Gigaword Second Edition | |
LDC2009T14 | Tagged Chinese Gigaword Version 2.0 | |
LDC2009T07 | Unified Linguistic Annotation Text Collection | |
LDC2009T25 | Web 1T 5-gram, 10 European Languages Version 1 |
2008
LDC2008S05 | 2005 NIST Language Recognition Evaluation | |
LDC2008T03 | ACE 2005 English SpatialML Annotations | |
LDC2008L01 | An English Dictionary of the Tamil Verb | |
LDC2008T25 | AQUAINT-2 Information-Retrieval Text Research Collection | |
LDC2008T13 | BLLIP North American News Text, Complete | |
LDC2008T14 | BLLIP North American News Text, General Release | |
LDC2008T17 | CALLHOME Mandarin Chinese Transcripts - XML version | |
LDC2008S09 | CHAracterizing INdividual Speakers (CHAINS) | |
LDC2008T07 | Chinese Proposition Bank 2.0 | |
LDC2008T24 | COMNOM v 1.0 | |
LDC2008S06 | CSLU: Alphadigit Version 1.3 | |
LDC2008S07 | CSLU: ISOLET Spoken Letter Database Version 1.3 | |
LDC2008S02 | CSLU: National Cellular Telephone Speech Release 2.3 | |
LDC2008S01 | CSLU: Portland Cellular Telephone Speech Version 1.3 | |
LDC2008T22 | Czech Academic Corpus 2.0 | |
LDC2008T02 | GALE Phase 1 Arabic Blog Parallel Text | |
LDC2008T09 | GALE Phase 1 Arabic Broadcast News Parallel Text - Part 2 | |
LDC2008T06 | GALE Phase 1 Chinese Blog Parallel Text | |
LDC2008T08 | GALE Phase 1 Chinese Broadcast News Parallel Text - Part 2 | |
LDC2008T18 | GALE Phase 1 Chinese Broadcast News Parallel Text - Part 3 | |
LDC2008L03 | Global Yoruba Lexical Database v. 1.0 | |
LDC2008L02 | Hindi WordNet | |
LDC2008T01 | Hungarian-English Parallel Text, Version 1.0 | |
LDC2008S08 | LDC Spoken Language Sampler | |
LDC2008T23 | NomBank v 1.0 | |
LDC2008T15 | North American News Text, Complete | |
LDC2008T16 | North American News Text, General Release | |
LDC2008T04 | OntoNotes Release 2.0 | |
LDC2008T05 | Penn Discourse Treebank Version 2.0 | |
LDC2008T20 | PennBioIE CYP 1.0 | |
LDC2008T21 | PennBioIE Oncology 1.0 | |
LDC2008S03 | STC-TIMIT 1.0 | |
LDC2008S04 | West Point Brazilian Portuguese Speech |
2007
LDC2007T22 | 2001 Topic Annotated Enron Email Data Set | |
LDC2007S10 | 2003 NIST Rich Transcription Evaluation Data | |
LDC2007S12 | 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data | |
LDC2007S11 | 2004 Spring NIST Rich Transcription (RT-04S) Development Data | |
LDC2007T40 | Arabic Gigaword Third Edition | |
LDC2007S03 | ARL Urdu Speech Database, Training Data | |
LDC2007T38 | Chinese Gigaword Third Edition | |
LDC2007T36 | Chinese Treebank 6.0 | |
LDC2007S08 | CSLU: Foreign Accented English Release 1.2 | |
LDC2007S18 | CSLU: Kids` Speech Version 1.1 | |
LDC2007S13 | CSLU: Apple Words and Phrases | |
LDC2007S05 | CSLU: Yes/No Version 1.2 | |
LDC2007T02 | English Chinese Translation Treebank v 1.0 | |
LDC2007T07 | English Gigaword Third Edition | |
LDC2007S02 | Fisher Levantine Arabic Conversational Telephone Speech | |
LDC2007T04 | Fisher Levantine Arabic Conversational Telephone Speech, Transcripts | |
LDC2007T24 | GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1 | |
LDC2007T23 | GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1 | |
LDC2007T20 | GALE Phase 1 Distillation Training | |
LDC2007T08 | ISI Arabic-English Automatically Extracted Parallel Text | |
LDC2007T09 | ISI Chinese-English Automatically Extracted Parallel Text | |
LDC2007S01 | Levantine Arabic Conversational Telephone Speech | |
LDC2007T01 | Levantine Arabic Conversational Telephone Speech, Transcripts | |
LDC2007S09 | Mandarin Affective Speech | |
LDC2007T19 | MITRE 1997 Mandarin Broadcast News Speech Translations (HUB-4NE) | |
LDC2007S15 | Nationwide Speech Project | |
LDC2007T21 | OntoNotes Release 1.0 | |
LDC2007T03 | Tagged Chinese Gigaword | |
LDC2007V02 | TRECVID 2003 Keyframes & Transcripts | |
LDC2007V01 | TRECVID 2005 Keyframes & Transcripts |
2006
LDC2006S31 | 2003 NIST Language Recognition Evaluation | |
LDC2006S44 | 2004 NIST Speaker Recognition Evaluation | |
LDC2006T06 | ACE 2005 Multilingual Training Corpus | |
LDC2006S46 | Arabic Broadcast News Speech | |
LDC2006T20 | Arabic Broadcast News Transcripts | |
LDC2006T02 | Arabic Gigaword Second Edition | |
LDC2006S15 | CSLU: Spelled and Spoken Words | |
LDC2006S14 | CSLU: Stories v 1.2 | |
LDC2006S35 | CSLU: Multilanguage Telephone Speech Version 1.2 | |
LDC2006S39 | CSLU: Names Release 1.3 | |
LDC2006S26 | CSLU: Speaker Recognition Version 1.1 | |
LDC2006S16 | CSLU: Spoltech Brazilian Portuguese Version 1.0 | |
LDC2006S01 | CSLU: Voices | |
LDC2006T10 | English-Arabic Treebank v 1.0 | |
LDC2006T17 | French Gigaword First Edition | |
LDC2006S43 | Gulf Arabic Conversational Telephone Speech | |
LDC2006T15 | Gulf Arabic Conversational Telephone Speech, Transcripts | |
LDC2006S45 | Iraqi Arabic Conversational Telephone Speech | |
LDC2006T16 | Iraqi Arabic Conversational Telephone Speech, Transcripts | |
LDC2006S42 | Korean Broadcast News Speech | |
LDC2006T14 | Korean Broadcast News Transcripts | |
LDC2006T03 | Korean Propbank | |
LDC2006T09 | Korean Treebank Annotations Version 2.0 | |
LDC2006S29 | Levantine Arabic QT Training Data Set 5, Speech | |
LDC2006T07 | Levantine Arabic QT Training Data Set 5, Transcripts | |
LDC2006S33 | Middle East Technical University Turkish Microphone Speech v 1.0 | |
LDC2006T04 | Multiple-Translation Chinese (MTC) Part 4 | |
LDC2006S13 | N4 NATO Native and Non-Native Speech | |
LDC2006T01 | Prague Dependency Treebank 2.0 | |
LDC2006S34 | Russian through Switched Telephone Network (RuSTeN) | |
LDC2006T12 | Spanish Gigaword First Edition | |
LDC2006S30 | Speech Controlled Computing | |
LDC2006T18 | TDT5 Multilingual Text | |
LDC2006T19 | TDT5 Topics and Annotations | |
LDC2006T08 | TimeBank 1.2 | |
LDC2006T13 | Web 1T 5-gram Version 1 | |
LDC2006S37 | West Point Heroico Spanish Speech | |
LDC2006S36 | West Point Korean Speech |
2005
LDC2005T09 | ACE 2004 Multilingual Training Corpus | |
LDC2005T07 | ACE Time Normalization (TERN) 2004 English Training Data v 1.0 | |
LDC2005T35 | American National Corpus (ANC) Second Release | |
LDC2005S07 | Arabic CTS Levantine Fisher Training Data Set 3, Speech | |
LDC2005T03 | Arabic CTS Levantine Fisher Training Data Set 3, Transcripts | |
LDC2005T02 | Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis) | |
LDC2005T20 | Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis) | |
LDC2005T30 | Arabic Treebank: Part 4 v 1.0 (MPG Annotation) | |
LDC2005S22 | Articulation Index | |
LDC2005T33 | BBN Pronoun Coreference and Entity Type Corpus | |
LDC2005S08 | BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts | |
LDC2005T13 | CCGbank | |
LDC2005T34 | Chinese <-> English Name Entity Lists v 1.0 | |
LDC2005T10 | Chinese English News Magazine Parallel Text | |
LDC2005T14 | Chinese Gigaword Second Edition | |
LDC2005T06 | Chinese News Translation Text Part 1 | |
LDC2005T23 | Chinese Proposition Bank 1.0 | |
LDC2005T01 | Chinese Treebank 5.0 | |
LDC2005S26 | CSLU: 22 Languages Corpus | |
LDC2005T08 | Discourse Graphbank | |
LDC2005T12 | English Gigaword Second Edition | |
LDC2005S13 | Fisher English Training Part 2, Speech | |
LDC2005T19 | Fisher English Training Part 2, Transcripts | |
LDC2005T28 | HARD 2004 Text | |
LDC2005T29 | HARD 2004 Topics and Annotations | |
LDC2005S15 | HKUST Mandarin Telephone Speech, Part 1 | |
LDC2005T32 | HKUST Mandarin Telephone Transcript Data, Part 1 | |
LDC2005S14 | Levantine Arabic QT Training Data Set 4 (Speech + Transcripts) | |
LDC2005L01 | Mawukakan Lexicon | |
LDC2005T05 | Multiple-Translation Arabic (MTA) Part 2 | |
LDC2005S16 | RT-04 MDE Training Data Speech | |
LDC2005T24 | RT-04 MDE Training Data Text/Annotations | |
LDC2005S25 | Santa Barbara Corpus of Spoken American English Part IV | |
LDC2005S11 | TDT4 Multilingual Broadcast News Speech Corpus | |
LDC2005T16 | TDT4 Multilingual Text and Annotations | |
LDC2005S30 | West Point Company G3 American English Speech | |
LDC2005S28 | West Point Croatian Speech |
2004
LDC2004T15 | 2000 Communicator Dialogue Act Tagged | |
LDC2004T16 | 2001 Communicator Dialogue Act Tagged | |
LDC2004S04 | 2002 NIST Speaker Recognition Evaluation | |
LDC2004S11 | 2002 Rich Transcription Broadcast News and Conversational Telephone Speech | |
LDC2004T18 | Arabic English Parallel News Part 1 | |
LDC2004T17 | Arabic News Translation Text Part 1 | |
LDC2004T02 | Arabic Treebank: Part 2 v 2.0 | |
LDC2004T11 | Arabic Treebank: Part 3 v 1.0 | |
LDC2004L02 | Buckwalter Arabic Morphological Analyzer Version 2.0 | |
LDC2004T05 | Chinese Treebank 4.0 | |
LDC2004S01 | Czech Broadcast News Speech | |
LDC2004T01 | Czech Broadcast News Transcripts | |
LDC2004S13 | Fisher English Training Speech Part 1 Speech | |
LDC2004T19 | Fisher English Training Speech Part 1 Transcripts | |
LDC2004V01 | FORM1 Kinematic Gesture | |
LDC2004T08 | Hong Kong Parallel Text | |
LDC2004S02 | ICSI Meeting Speech | |
LDC2004T04 | ICSI Meeting Transcripts | |
LDC2004S05 | ISL Meeting Speech Part 1 | |
LDC2004T10 | ISL Meeting Transcripts Part 1 | |
LDC2004L01 | Klex: Finite-State Lexical Transducer for Korean | |
LDC2004T03 | Morphologically Annotated Korean Text | |
LDC2004T07 | Multiple-Translation Chinese (MTC) Part 3 | |
LDC2004S09 | NIST Meeting Pilot Corpus Speech | |
LDC2004T13 | NIST Meeting Pilot Corpus Transcripts and Metadata | |
LDC2004T23 | Prague Arabic Dependency Treebank 1.0 | |
LDC2004T25 | Prague Czech-English Dependency Treebank 1.0 | |
LDC2004T14 | Proposition Bank I | |
LDC2004S08 | RT-03 MDE Training Data Speech | |
LDC2004T12 | RT-03 MDE Training Data Text and Annotations | |
LDC2004S10 | Santa Barbara Corpus of Spoken American English Part III | |
LDC2004S07 | Switchboard Cellular Part 2 Audio | |
LDC2004S12 | TalkBank Ethology Data: Field Recordings of Vervet Monkey Calls | |
LDC2004T09 | TIDES Extraction (ACE) 2003 Multilingual Training Data |
2003
LDC2003T03 | 1997 HUB5 German Transcripts | |
LDC2003T04 | 1997 HUB5 Spanish Transcripts | |
LDC2003T02 | 1998 HUB5 English Transcripts | |
LDC2003S01 | 2001 Communicator Evaluation | |
LDC2003T01 | 2001 HUB5 Mandarin Transcripts | |
LDC2003T11 | ACE-2 Version 1.0 | |
LDC2003T12 | Arabic Gigaword | |
LDC2003T07 | Arabic Treebank: Part 1 - 10K-word English Translation | |
LDC2003T06 | Arabic Treebank: Part 1 v 2.0 | |
LDC2003T09 | Chinese Gigaword | |
LDC2003T05 | English Gigaword | |
LDC2003V01 | FORM2 Kinematic Gesture | |
LDC2003L01 | Grassfields Bantu Fieldwork: Dschang Lexicon | |
LDC2003S02 | Grassfields Bantu Fieldwork: Dschang Tone Paradigms | |
LDC2003S07 | Korean Telephone Conversations Complete Set | |
LDC2003L02 | Korean Telephone Conversations Lexicon | |
LDC2003S03 | Korean Telephone Conversations Speech | |
LDC2003T08 | Korean Telephone Conversations Transcripts | |
LDC2003T13 | Message Understanding Conference (MUC) 6 | |
LDC2003T18 | Multiple-Translation Arabic (MTA) Part 1 | |
LDC2003T17 | Multiple-Translation Chinese (MTC) Part 2 | |
LDC2003T10 | SAID | |
LDC2003S06 | Santa Barbara Corpus of Spoken American English Part II | |
LDC2003T15 | SLX Corpus of Classic Sociolinguistic Interviews | |
LDC2003T16 | SummBank 1.0 | |
LDC2003S05 | West Point Russian Speech |
2002
LDC2002S11 | 1997 HUB4 English Evaluation Speech and Transcripts | |
LDC2002S22 | 1997 HUB5 Arabic Evaluation | |
LDC2002T39 | 1997 HUB5 Arabic Transcripts | |
LDC2002S23 | 1997 HUB5 English Evaluation | |
LDC2002S24 | 1997 HUB5 German Evaluation | |
LDC2003T03 | 1997 HUB5 German Transcripts | |
LDC2002S25 | 1997 HUB5 Spanish Evaluation | |
LDC2003T04 | 1997 HUB5 Spanish Transcripts | |
LDC2002S10 | 1998 HUB5 English Evaluation | |
LDC2003T02 | 1998 HUB5 English Transcripts | |
LDC2002S56 | 2000 Communicator Evaluation | |
LDC2002S09 | 2000 HUB5 English Evaluation Speech | |
LDC2002T43 | 2000 HUB5 English Evaluation Transcripts | |
LDC2002S13 | 2001 HUB5 English Evaluation | |
LDC2002S12 | 2001 HUB5 Mandarin Evaluation | |
LDC2003T01 | 2001 HUB5 Mandarin Transcripts | |
LDC2002S34 | 2001 NIST Speaker Recognition Evaluation Corpus | |
LDC2002L49 | Buckwalter Arabic Morphological Analyzer Version 1.0 | |
LDC2002S37 | CALLHOME Egyptian Arabic Speech Supplement | |
LDC2002T38 | CALLHOME Egyptian Arabic Transcripts Supplement | |
LDC2002L27 | Chinese-English Translation Lexicon Version 3.0 | |
LDC2002S28 | Emotional Prosody Speech and Transcripts | |
LDC2001S16 | Grassfields Bantu Fieldwork: Ngomba Tone Paradigms | |
LDC2002T26 | Korean English Treebank Annotations | |
LDC2002T01 | Multiple-Translation Chinese Corpus | |
LDC2002T07 | RST Discourse Treebank | |
LDC2001S08 | Speech in Noisy Environments (SPINE2) Part 3 Audio | |
LDC2001T09 | Speech in Noisy Environments (SPINE2) Part 3 Transcripts | |
LDC2002S06 | Switchboard-2 Phase III Audio | |
LDC2002T31 | The AQUAINT Corpus of English News Text | |
LDC2002S04 | Translanguage English Database (TED) Speech | |
LDC2002T03 | Translanguage English Database (TED) Transcripts | |
LDC2002S35 | Voicemail Corpus Part II | |
LDC2002S02 | West Point Arabic Speech |
2001
LDC2001S91 | 1997 HUB4 Broadcast News Evaluation Non-English Test Material | |
LDC2001S97 | 2000 NIST Speaker Recognition Evaluation | |
LDC2001T55 | Arabic Newswire Part 1 | |
LDC2001T61 | CALLHOME Spanish Dialogue Act Annotation | |
LDC2001T62 | CETEMpublico | |
LDC2001T11 | Chinese Treebank 2.0 | |
LDC2001S16 | Grassfields Bantu Fieldwork: Ngomba Tone Paradigms | |
LDC2001T02 | Message Understanding Conference (MUC) 7 | |
LDC2001T10 | Prague Dependency Treebank 1.0 | |
LDC2001S04 | Speech in Noisy Environments (SPINE2) Part 1 Audio | |
LDC2001T05 | Speech in Noisy Environments (SPINE2) Part 1 Transcripts | |
LDC2001S06 | Speech in Noisy Environments (SPINE2) Part 2 Audio | |
LDC2001T07 | Speech in Noisy Environments (SPINE2) Part 2 Transcripts | |
LDC2001S08 | Speech in Noisy Environments (SPINE2) Part 3 Audio | |
LDC2001T09 | Speech in Noisy Environments (SPINE2) Part 3 Transcripts | |
LDC2001S99 | Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio | |
LDC2001S13 | Switchboard Cellular Part 1 Audio | |
LDC2001S15 | Switchboard Cellular Part 1 Transcribed Audio | |
LDC2001T14 | Switchboard Cellular Part 1 Transcription | |
LDC2001T60 | Syllable-Final /s/ Lenition | |
LDC2001S93 | TDT2 Mandarin Audio Corpus | |
LDC2001T57 | TDT2 Multilanguage Text Version 4.0 | |
LDC2001S94 | TDT3 English Audio | |
LDC2001S95 | TDT3 Mandarin Audio | |
LDC2001T58 | TDT3 Multilanguage Text Version 2.0 |
2000
LDC2000S86 | 1998 HUB4 Broadcast News Evaluation English Test Material | |
LDC2000S88 | 1999 HUB4 Broadcast News Evaluation English Test Material | |
LDC2000T43 | BLLIP 1987-89 WSJ Corpus Release 1 | |
LDC2000T50 | Hong Kong Hansards Parallel Text | |
LDC2000T47 | Hong Kong Laws Parallel Text | |
LDC2000T46 | Hong Kong News Parallel Text | |
LDC2000T45 | Korean Newswire | |
LDC2000S85 | Santa Barbara Corpus of Spoken American English Part I | |
LDC2000S96 | Speech in Noisy Environments (SPINE) Evaluation Audio | |
LDC2000T54 | Speech in Noisy Environments (SPINE) Evaluation Transcripts | |
LDC2000S87 | Speech in Noisy Environments (SPINE) Training Audio | |
LDC2000T49 | Speech in Noisy Environments (SPINE) Training Transcripts | |
LDC2000S92 | TDT2 Careful Transcription Audio | |
LDC2000T44 | TDT2 Careful Transcription Text | |
LDC2000T52 | TREC Mandarin | |
LDC2000T51 | TREC Spanish | |
LDC2000S89 | Voice of America (VOA) Czech Broadcast News Audio | |
LDC2000T53 | Voice of America (VOA) Czech Broadcast News Transcripts |
1999
LDC99S80 | 1997 Speaker Recognition Benchmark | |
LDC99S81 | 1999 Speaker Recognition Benchmark | |
LDC99L23 | American English Spoken Lexicon | |
LDC99L22 | Egyptian Colloquial Arabic Lexicon | |
LDC99T34 | Japanese Business News Text Supplement | |
LDC99T40 | Portuguese Newswire Text | |
LDC99T41 | Spanish Newswire Text, Volume 2 | |
LDC99S78 | SUSAS | |
LDC99T33 | SUSAS Transcripts | |
LDC99S79 | Switchboard-2 Phase II | |
LDC99S83 | Tactical Speaker Identification Speech Corpus (TSID) | |
LDC99S84 | TDT2 English Audio | |
LDC99T42 | Treebank-3 | |
LDC99S82 | USC Marketplace Broadcast News Speech | |
LDC99T36 | USC Marketplace Broadcast News Transcripts |
1998
LDC98T31 | 1996 CSR HUB4 Language Model | |
LDC97S66 | 1996 English Broadcast News Dev and Eval (HUB4) | |
LDC97S44 | 1996 English Broadcast News Speech (HUB4) | |
LDC97T22 | 1996 English Broadcast News Transcripts (HUB4) | |
LDC98S71 | 1997 English Broadcast News Speech (HUB4) | |
LDC98T28 | 1997 English Broadcast News Transcripts (HUB4) | |
LDC98S73 | 1997 Mandarin Broadcast News Speech (HUB4-NE) | |
LDC98T24 | 1997 Mandarin Broadcast News Transcripts (HUB4-NE) | |
LDC98S74 | 1997 Spanish Broadcast News Speech (HUB4-NE) | |
LDC98T29 | 1997 Spanish Broadcast News Transcripts (HUB4-NE) | |
LDC98S76 | 1998 Speaker Recognition Benchmark | |
LDC98L21 | COMLEX English Syntax Lexicon | |
LDC96T11 | COMLEX Syntax Text Corpus Version 2.0 | |
LDC95S23 | CSR-III Speech | |
LDC95T6 | CSR-III Text | |
LDC98S67 | HTIMIT | |
LDC98S69 | HUB5 Mandarin Telephone Speech Corpus | |
LDC98T26 | HUB5 Mandarin Transcripts | |
LDC98S70 | HUB5 Spanish Telephone Speech Corpus | |
LDC98T27 | HUB5 Spanish Transcripts | |
LDC98T32 | JURIS | |
LDC95S22 | KING Speaker Verification | |
LDC98S68 | LLHDB | |
LDC98T30 | North American News Text Supplement | |
LDC98S75 | Switchboard-2 Phase I | |
LDC98S72 | Taiwanese Putonghua Speech and Transcripts | |
LDC98T25 | TDT Pilot Study Corpus | |
LDC98S77 | Voicemail Corpus Part I | |
LDC94S16 | YOHO Speaker Verification |
1997
LDC97S66 | 1996 English Broadcast News Dev and Eval (HUB4) | |
LDC97S44 | 1996 English Broadcast News Speech (HUB4) | |
LDC97T22 | 1996 English Broadcast News Transcripts (HUB4) | |
LDC96S61 | 1996 Speaker Recognition Benchmark | |
LDC94S14A | Air Traffic Control Complete | |
LDC96S36 | Boston University Radio Speech Corpus | |
LDC96S46 | CALLFRIEND American English-Non-Southern Dialect | |
LDC96S47 | CALLFRIEND American English-Southern Dialect | |
LDC96S48 | CALLFRIEND Canadian French | |
LDC96S49 | CALLFRIEND Egyptian Arabic | |
LDC96S50 | CALLFRIEND Farsi | |
LDC96S51 | CALLFRIEND German | |
LDC96S52 | CALLFRIEND Hindi | |
LDC96S53 | CALLFRIEND Japanese | |
LDC96S54 | CALLFRIEND Korean | |
LDC96S55 | CALLFRIEND Mandarin Chinese-Mainland Dialect | |
LDC96S56 | CALLFRIEND Mandarin Chinese-Taiwan Dialect | |
LDC96S57 | CALLFRIEND Spanish-Caribbean Dialect | |
LDC96S58 | CALLFRIEND Spanish-Non-Caribbean Dialect | |
LDC96S59 | CALLFRIEND Tamil | |
LDC96S60 | CALLFRIEND Vietnamese | |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) | |
LDC97S42 | CALLHOME American English Speech | |
LDC97T14 | CALLHOME American English Transcripts | |
LDC97S45 | CALLHOME Egyptian Arabic Speech | |
LDC97T19 | CALLHOME Egyptian Arabic Transcripts | |
LDC97L18 | CALLHOME German Lexicon | |
LDC97S43 | CALLHOME German Speech | |
LDC97T15 | CALLHOME German Transcripts | |
LDC96L17 | CALLHOME Japanese Lexicon | |
LDC96S37 | CALLHOME Japanese Speech | |
LDC96T18 | CALLHOME Japanese Transcripts | |
LDC96L15 | CALLHOME Mandarin Chinese Lexicon | |
LDC96S34 | CALLHOME Mandarin Chinese Speech | |
LDC96T16 | CALLHOME Mandarin Chinese Transcripts | |
LDC96L16 | CALLHOME Spanish Lexicon | |
LDC96S35 | CALLHOME Spanish Speech | |
LDC96T17 | CALLHOME Spanish Transcripts | |
LDC94S13A | CSR-II (WSJ1) Complete | |
LDC94S13B | CSR-II (WSJ1) Sennheiser | |
LDC97T12 | DSO Corpus of Sense-Tagged English | |
LDC99L22 | Egyptian Colloquial Arabic Lexicon | |
LDC95T20 | Hansard French/English | |
LDC96S64-1 | JEIDA/JCSD-Channel 0 City Names | |
LDC96S64 | JEIDA/JCSD-Channel 0 Complete | |
LDC96S64-2 | JEIDA/JCSD-Channel 0 Control Words | |
LDC96S64-4 | JEIDA/JCSD-Channel 0 Four Digit Sequences | |
LDC96S64-3 | JEIDA/JCSD-Channel 0 Isolated Digits | |
LDC96S64-5 | JEIDA/JCSD-Channel 0 Mono Syllables | |
LDC96S65-1 | JEIDA/JCSD-Channel 1 City Names | |
LDC96S65 | JEIDA/JCSD-Channel 1 Complete | |
LDC96S65-2 | JEIDA/JCSD-Channel 1 Control Words | |
LDC96S65-4 | JEIDA/JCSD-Channel 1 Four Digit Sequences | |
LDC96S65-3 | JEIDA/JCSD-Channel 1 Isolated Digits | |
LDC96S65-5 | JEIDA/JCSD-Channel 1 Mono Syllables | |
LDC95T13 | Mandarin Chinese News Text | |
LDC95T21 | North American News Text Corpus | |
LDC94S15 | SPIDRE | |
LDC97S62 | Switchboard-1 Release 2 | |
LDC97S63 | The CMU Kids Corpus |
1996
LDC96S61 | 1996 Speaker Recognition Benchmark | |
LDC96S36 | Boston University Radio Speech Corpus | |
LDC94S20 | BRAMSHILL | |
LDC96S46 | CALLFRIEND American English-Non-Southern Dialect | |
LDC96S47 | CALLFRIEND American English-Southern Dialect | |
LDC96S48 | CALLFRIEND Canadian French | |
LDC96S49 | CALLFRIEND Egyptian Arabic | |
LDC96S50 | CALLFRIEND Farsi | |
LDC96S51 | CALLFRIEND German | |
LDC96S52 | CALLFRIEND Hindi | |
LDC96S53 | CALLFRIEND Japanese | |
LDC96S54 | CALLFRIEND Korean | |
LDC96S55 | CALLFRIEND Mandarin Chinese-Mainland Dialect | |
LDC96S56 | CALLFRIEND Mandarin Chinese-Taiwan Dialect | |
LDC96S57 | CALLFRIEND Spanish-Caribbean Dialect | |
LDC96S58 | CALLFRIEND Spanish-Non-Caribbean Dialect | |
LDC96S59 | CALLFRIEND Tamil | |
LDC96S60 | CALLFRIEND Vietnamese | |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) | |
LDC96L17 | CALLHOME Japanese Lexicon | |
LDC96S37 | CALLHOME Japanese Speech | |
LDC96T18 | CALLHOME Japanese Transcripts | |
LDC96L15 | CALLHOME Mandarin Chinese Lexicon | |
LDC96S34 | CALLHOME Mandarin Chinese Speech | |
LDC96T16 | CALLHOME Mandarin Chinese Transcripts | |
LDC96L16 | CALLHOME Spanish Lexicon | |
LDC96S35 | CALLHOME Spanish Speech | |
LDC96T17 | CALLHOME Spanish Transcripts | |
LDC96L14 | CELEX2 | |
LDC98L21 | COMLEX English Syntax Lexicon | |
LDC96T11 | COMLEX Syntax Text Corpus Version 2.0 | |
LDC93S6A | CSR-I (WSJ0) Complete | |
LDC93S6C | CSR-I (WSJ0) Other | |
LDC93S6B | CSR-I (WSJ0) Sennheiser | |
LDC96S33 | CSR-IV HUB3 | |
LDC96S31 | CSR-IV HUB4 | |
LDC96S30 | CTIMIT | |
LDC96S38 | DCIEM/HCRC | |
LDC95T11 | European Language Newspaper Text | |
LDC96S32 | FFMTIMIT | |
LDC96S29 | Frontiers in Speech Processing 93 | |
LDC96S40 | Frontiers in Speech Processing 94 | |
LDC95T20 | Hansard French/English | |
LDC93S12 | HCRC Map Task Corpus | |
LDC96S64-1 | JEIDA/JCSD-Channel 0 City Names | |
LDC96S64 | JEIDA/JCSD-Channel 0 Complete | |
LDC96S64-2 | JEIDA/JCSD-Channel 0 Control Words | |
LDC96S64-4 | JEIDA/JCSD-Channel 0 Four Digit Sequences | |
LDC96S64-3 | JEIDA/JCSD-Channel 0 Isolated Digits | |
LDC96S64-5 | JEIDA/JCSD-Channel 0 Mono Syllables | |
LDC96S65-1 | JEIDA/JCSD-Channel 1 City Names | |
LDC96S65 | JEIDA/JCSD-Channel 1 Complete | |
LDC96S65-2 | JEIDA/JCSD-Channel 1 Control Words | |
LDC96S65-4 | JEIDA/JCSD-Channel 1 Four Digit Sequences | |
LDC96S65-3 | JEIDA/JCSD-Channel 1 Isolated Digits | |
LDC96S65-5 | JEIDA/JCSD-Channel 1 Mono Syllables | |
LDC95T13 | Mandarin Chinese News Text | |
LDC96T10 | Message Understanding Conference (MUC) 6 Additional News Text | |
LDC95T21 | North American News Text Corpus | |
LDC93S3A | Resource Management Complete Set 2.0 | |
LDC93S3B | Resource Management RM1 2.0 | |
LDC93S3C | Resource Management RM2 2.0 | |
LDC96S39 | RM Isolated and Spelled Word Data | |
LDC95T9 | Spanish News Text | |
LDC96S41 | VAHA (POLYPHONE II) |
1995
LDC95S26 | ATIS3 Test Data | |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) | |
LDC96L14 | CELEX2 | |
LDC98L21 | COMLEX English Syntax Lexicon | |
LDC95S23 | CSR-III Speech | |
LDC95T6 | CSR-III Text | |
LDC95T11 | European Language Newspaper Text | |
LDC95T20 | Hansard French/English | |
LDC95T8 | Japanese Business News Text | |
LDC95S22 | KING Speaker Verification | |
LDC95S28 | LATINO-40 Spanish Read News | |
LDC95T13 | Mandarin Chinese News Text | |
LDC95T21 | North American News Text Corpus | |
LDC95S27 | PhoneBook: NYNEX Isolated Words | |
LDC95T9 | Spanish News Text | |
LDC95S25 | TRAINS Spoken Dialog Corpus | |
LDC95T7 | Treebank-2 | |
LDC95S24 | WSJCAM0 Cambridge Read News |
1994
LDC94S14B | Air Traffic Control BOS | |
LDC94S14A | Air Traffic Control Complete | |
LDC94S14C | Air Traffic Control DCA | |
LDC94S14D | Air Traffic Control DFW | |
LDC94S19 | ATIS3 Training Data | |
LDC94S20 | BRAMSHILL | |
LDC97L20 | CALLHOME American English Lexicon (PRONLEX) | |
LDC98L21 | COMLEX English Syntax Lexicon | |
LDC94S13A | CSR-II (WSJ1) Complete | |
LDC94S13C | CSR-II (WSJ1) Other | |
LDC94S13B | CSR-II (WSJ1) Sennheiser | |
LDC94T5 | ECI Multilingual Text | |
LDC94S21 | MACROPHONE | |
LDC94S17 | OGI Multilanguage Corpus | |
LDC94S18 | OGI Spelled and Spoken Word | |
LDC94S15 | SPIDRE | |
LDC94T4A | UN Parallel Text (Complete) | |
LDC94T4B-1 | UN Parallel Text (English) | |
LDC94T4B-2 | UN Parallel Text (French) | |
LDC94T4B-3 | UN Parallel Text (Spanish) | |
LDC94S16 | YOHO Speaker Verification |
1993
LDC93T1 | ACL/DCI | |
LDC93S4A | ATIS0 Complete | |
LDC93S4B | ATIS0 Pilot | |
LDC93S4B-2 | ATIS0 Read | |
LDC93S4B-3 | ATIS0 SD Read | |
LDC93S5 | ATIS2 | |
LDC93S6A | CSR-I (WSJ0) Complete | |
LDC93S6C | CSR-I (WSJ0) Other | |
LDC93S6B | CSR-I (WSJ0) Sennheiser | |
LDC93S12 | HCRC Map Task Corpus | |
LDC93S2 | NTIMIT | |
LDC93S3A | Resource Management Complete Set 2.0 | |
LDC93S3B | Resource Management RM1 2.0 | |
LDC93S3C | Resource Management RM2 2.0 | |
LDC93S11 | Road Rally | |
LDC93S8 | Switchboard Credit Card | |
LDC97S62 | Switchboard-1 Release 2 | |
LDC93S9 | TI 46-Word | |
LDC93S10 | TIDIGITS | |
LDC93S1W | TIMIT Acoustic-Phonetic Continuous Speech (MS-WAV version) | |
LDC93S1 | TIMIT Acoustic-Phonetic Continuous Speech Corpus | |
LDC93T3A | TIPSTER Complete | |
LDC93T3B | TIPSTER Volume 1 | |
LDC93T3C | TIPSTER Volume 2 | |
LDC93T3D | TIPSTER Volume 3 |