LDC Catalog by Year

2015

  • LDC2015S10 Arabic Learner Corpus
  • LDC2015T03 Avocado Research Email Collection
  • LDC2015S07 CIEMPIESS
  • LDC2015T08 Coordination Annotation for the Penn Treebank
  • LDC2015T13 English News Text Treebank: Penn Treebank Revised
  • LDC2015T06 GALE Chinese-English Parallel Aligned Treebank -- Training
  • LDC2015T04 GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 3
  • LDC2015S01 GALE Phase 2 Arabic Broadcast News Speech Part 2
  • LDC2015T01 GALE Phase 2 Arabic Broadcast News Transcripts Part 2
  • LDC2015T05 GALE Phase 3 and 4 Arabic Broadcast Conversation Parallel Text
  • LDC2015T07 GALE Phase 3 and 4 Arabic Broadcast News Parallel Text
  • LDC2015S11 GALE Phase 3 Arabic Broadcast Conversation Speech Part 1
  • LDC2015T16 GALE Phase 3 Arabic Broadcast Conversation Transcripts Part 1
  • LDC2015S06 GALE Phase 3 Chinese Broadcast Conversation Speech Part 2
  • LDC2015T09 GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 2
  • LDC2015T14 GALE Phase 4 Chinese Broadcast Conversation Parallel Sentences
  • LDC2015S05 Mandarin Chinese Phonetic Segmentation and Tone
  • LDC2015S04 Mandarin-English Code-Switching in South-East Asia
  • LDC2015S02 RATS Speech Activity Detection
  • LDC2015T10 RST Signalling Corpus
  • LDC2015T02 SenSem Databank
  • LDC2015L01 SenSem Lexicons
  • LDC2015S03 The Subglottal Resonances Database
  • LDC2015S08 The Walking Around Corpus
  • LDC2015T15 TS Wikipedia

2014

  • LDC2014S06 2009 NIST Language Recognition Evaluation Test Set
  • LDC2014T12 Abstract Meaning Representation (AMR) Annotation Release 1.0
  • LDC2014T18 ACE 2007 Multilingual Training Corpus
  • LDC2014T27 Benchmarks for Open Relation Extraction
  • LDC2014T24 Boulder Lies and Truth
  • LDC2014S01 CALLFRIEND Farsi Second Edition Speech
  • LDC2014T01 CALLFRIEND Farsi Second Edition Transcripts
  • LDC2014T21 Chinese Discourse Treebank 0.5
  • LDC2014T07 Domain-Specific Hyponym Relations
  • LDC2014T06 ETS Corpus of Non-Native Written English
  • LDC2014T23 Fisher and CALLHOME Spanish--English Speech Translation
  • LDC2014T03 GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 2
  • LDC2014T08 GALE Arabic-English Parallel Aligned Treebank -- Web Training
  • LDC2014T19 GALE Arabic-English Word Alignment -- Broadcast Training Part 1
  • LDC2014T22 GALE Arabic-English Word Alignment -- Broadcast Training Part 2
  • LDC2014T05 GALE Arabic-English Word Alignment Training Part 1 -- Newswire and Web
  • LDC2014T10 GALE Arabic-English Word Alignment Training Part 2 -- Newswire
  • LDC2014T14 GALE Arabic-English Word Alignment Training Part 3 -- Web
  • LDC2014T25 GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 2
  • LDC2014S07 GALE Phase 2 Arabic Broadcast News Speech Part 1
  • LDC2014T17 GALE Phase 2 Arabic Broadcast News Transcripts Part 1
  • LDC2014T04 GALE Phase 2 Chinese Broadcast News Parallel Text Part 1
  • LDC2014T11 GALE Phase 2 Chinese Broadcast News Parallel Text Part 2
  • LDC2014T15 GALE Phase 2 Chinese Newswire Parallel Text Part 1
  • LDC2014T20 GALE Phase 2 Chinese Newswire Parallel Text Part 2
  • LDC2014T26 GALE Phase 2 Chinese Web Parallel Text
  • LDC2014S09 GALE Phase 3 Chinese Broadcast Conversation Speech Part 1
  • LDC2014T28 GALE Phase 3 Chinese Broadcast Conversation Transcripts Part 1
  • LDC2014S05 Hispanic-English Database
  • LDC2014T09 HyTER Networks of Selected OpenMT08/09 Sentences
  • LDC2014S02 King Saud University Arabic Speech Database
  • LDC2014T13 MADCAT Chinese Pilot Training Set
  • LDC2014S03 Multi-Channel WSJ Audio
  • LDC2014T02 NIST 2012 Open Machine Translation (OpenMT) Progress Test Five Language Source
  • LDC2014T16 TAC KBP Reference Knowledge Base
  • LDC2014S08 United Nations Proceedings Speech
  • LDC2014S04 USC-SFI MALACH Interviews and Transcripts Czech

2013

  • LDC2013T06 1993-2007 United Nations Parallel Text
  • LDC2013T13 Chinese Proposition Bank 3.0
  • LDC2013T21 Chinese Treebank 8.0
  • LDC2013T02 Chinese-English Biology and Chemistry Abstract Parallel Text
  • LDC2013S09 CSC Deceptive Speech
  • LDC2013T14 GALE Arabic-English Parallel Aligned Treebank -- Broadcast News Part 1
  • LDC2013T10 GALE Arabic-English Parallel Aligned Treebank -- Newswire
  • LDC2013T23 GALE Chinese-English Word Alignment and Tagging -- Broadcast Training Part 1
  • LDC2013T05 GALE Chinese-English Word Alignment and Tagging Training Part 4 -- Web
  • LDC2013S02 GALE Phase 2 Arabic Broadcast Conversation Speech Part 1
  • LDC2013S07 GALE Phase 2 Arabic Broadcast Conversation Speech Part 2
  • LDC2013T04 GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 1
  • LDC2013T17 GALE Phase 2 Arabic Broadcast Conversation Transcripts Part 2
  • LDC2013T01 GALE Phase 2 Arabic Web Parallel Text
  • LDC2013T11 GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 1
  • LDC2013T16 GALE Phase 2 Chinese Broadcast Conversation Parallel Text Part 2
  • LDC2013S04 GALE Phase 2 Chinese Broadcast Conversation Speech
  • LDC2013T08 GALE Phase 2 Chinese Broadcast Conversation Transcripts
  • LDC2013S08 GALE Phase 2 Chinese Broadcast News Speech
  • LDC2013T20 GALE Phase 2 Chinese Broadcast News Transcripts
  • LDC2013S05 Greybeard
  • LDC2013S06 LDC Spoken Language Sampler - Second Release
  • LDC2013T09 MADCAT Phase 2 Training Set
  • LDC2013T15 MADCAT Phase 3 Training Set
  • LDC2013L01 Maninkakan Lexicon
  • LDC2013T12 Manually Annotated Sub-Corpus Third Release
  • LDC2013S03 Mixer 6 Speech
  • LDC2013T07 NIST 2008-2012 Open Machine Translation (OpenMT) Progress Test Sets
  • LDC2013T03 NIST 2012 Open Machine Translation (OpenMT) Evaluation
  • LDC2013T19 OntoNotes Release 5.0
  • LDC2013T18 Semantic Textual Similarity (STS) 2013 Machine Translation
  • LDC2013T22 The ARRAU Corpus of Anaphoric Information

2012

  • LDC2012V01 2005 NIST/USF Evaluation Resources for the VACE Program - Broadcast News
  • LDC2012S01 2006 NIST Speaker Recognition Evaluation Test Set Part 2
  • LDC2012T03 2009 CoNLL Shared Task Part 1
  • LDC2012T04 2009 CoNLL Shared Task Part 2
  • LDC2012T11 American English Nickname Collection
  • LDC2012T21 Annotated English Gigaword
  • LDC2012T07 Arabic Treebank - Broadcast News v1.0
  • LDC2012T09 Arabic-Dialect/English Parallel Text
  • LDC2012T10 Catalan TimeBank 1.0
  • LDC2012T05 Chinese Dependency Treebank 1.0
  • LDC2012T22 Chinese-English Semiconductor Parallel Text
  • LDC2012S03 Digital Archive of Southern Speech
  • LDC2012T02 English Translation Treebank: An-Nahar Newswire
  • LDC2012T13 English Web Treebank
  • LDC2012T16 GALE Chinese-English Word Alignment and Tagging Training Part 1 -- Newswire and Web
  • LDC2012T20 GALE Chinese-English Word Alignment and Tagging Training Part 2 -- Newswire
  • LDC2012T24 GALE Chinese-English Word Alignment and Tagging Training Part 3 -- Web
  • LDC2012T06 GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 1
  • LDC2012T14 GALE Phase 2 Arabic Broadcast Conversation Parallel Text Part 2
  • LDC2012T18 GALE Phase 2 Arabic Broadcast News Parallel Text
  • LDC2012T17 GALE Phase 2 Arabic Newswire Parallel Text
  • LDC2012T15 MADCAT Phase 1 Training Set
  • LDC2012S04 Malto Speech and Transcripts
  • LDC2012T01 ModeS TimeBank 1.0
  • LDC2012T08 Prague Czech-English Dependency Treebank 2.0
  • LDC2012T23 Russian-English Computer Security Parallel Text
  • LDC2012T12 Spanish TimeBank 1.0
  • LDC2012S02 TORGO Database of Dysarthric Articulation
  • LDC2012S06 Turkish Broadcast News Speech and Transcripts
  • LDC2012S05 USC-SFI MALACH Interviews and Transcripts English

2011

  • LDC2011S04 2005 NIST Speaker Recognition Evaluation Test Data
  • LDC2011S01 2005 NIST Speaker Recognition Evaluation Training Data
  • LDC2011S06 2005 Spring NIST Rich Transcription (RT-05S) Evaluation Set
  • LDC2011S10 2006 NIST Speaker Recognition Evaluation Test Set Part 1
  • LDC2011S09 2006 NIST Speaker Recognition Evaluation Training Set
  • LDC2011S02 2006 NIST Spoken Term Detection Development Set
  • LDC2011S03 2006 NIST Spoken Term Detection Evaluation Set
  • LDC2011V05 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
  • LDC2011V06 2006 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
  • LDC2011S11 2008 NIST Speaker Recognition Evaluation Supplemental Set
  • LDC2011S08 2008 NIST Speaker Recognition Evaluation Test Set
  • LDC2011S05 2008 NIST Speaker Recognition Evaluation Training Set Part 1
  • LDC2011S07 2008 NIST Speaker Recognition Evaluation Training Set Part 2
  • LDC2011T05 2008/2010 NIST Metrics for Machine Translation (MetricsMaTr) GALE Evaluation Set
  • LDC2011T02 ACE 2005 English SpatialML Annotations Version 2
  • LDC2011T11 Arabic Gigaword Fifth Edition
  • LDC2011T09 Arabic Treebank: Part 2 v 3.1
  • LDC2011T06 Broadcast News Lattices
  • LDC2011T13 Chinese Gigaword Fifth Edition
  • LDC2011T08 Datasets for Generic Relation Extraction (reACE)
  • LDC2011T07 English Gigaword Fifth Edition
  • LDC2011T10 French Gigaword Third Edition
  • LDC2011T04 Indian Language Part-of-Speech Tagset: Sanskrit
  • LDC2011V03 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 1
  • LDC2011V04 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Test Set Part 2
  • LDC2011V01 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 1
  • LDC2011V02 NIST/USF Evaluation Resources for the VACE Program - Meeting Data Training Set Part 2
  • LDC2011T03 OntoNotes Release 4.0
  • LDC2011T01 SemEval-2010 Task 1 OntoNotes English: Coreference Resolution in Multiple Languages
  • LDC2011T12 Spanish Gigaword Third Edition

2010

  • LDC2010S03 2003 NIST Speaker Recognition Evaluation
  • LDC2010T09 ACE 2005 Mandarin SpatialML Annotations
  • LDC2010T18 ACE Time Normalization (TERN) 2004 English Evaluation Data V1.0
  • LDC2010T13 Arabic Treebank: Part 1 v 4.1
  • LDC2010T08 Arabic Treebank: Part 3 v 3.2
  • LDC2010S05 Asian Elephant Vocalizations
  • LDC2010S07 Asian Spoken Language Sampler
  • LDC2010T07 Chinese Treebank 7.0
  • LDC2010T06 Chinese Web 5-gram Version 1
  • LDC2010T02 Czech Broadcast News MDE Transcripts
  • LDC2010T04 Fisher Spanish - Transcripts
  • LDC2010S01 Fisher Spanish Speech
  • LDC2010T03 GALE Phase 1 Chinese Newsgroup Parallel Text - Part 2
  • LDC2010T16 Indian Language Part-of-Speech Tagset: Bengali
  • LDC2010T24 Indian Language Part-of-Speech Tagset: Hindi
  • LDC2010T19 Korean Newswire Second Edition
  • LDC2010L01 LDC Standard Arabic Morphological Analyzer (SAMA) Version 3.1
  • LDC2010T22 Manually Annotated Sub-Corpus First Release
  • LDC2010T15 Message Understanding Conference 7 Timed (MUC7_T)
  • LDC2010T10 NIST 2002 Open Machine Translation (OpenMT) Evaluation
  • LDC2010T11 NIST 2003 Open Machine Translation (OpenMT) Evaluation
  • LDC2010T12 NIST 2004 Open Machine Translation (OpenMT) Evaluation
  • LDC2010T14 NIST 2005 Open Machine Translation (OpenMT) Evaluation
  • LDC2010T17 NIST 2006 Open Machine Translation (OpenMT) Evaluation
  • LDC2010T21 NIST 2008 Open Machine Translation (OpenMT) Evaluation
  • LDC2010T23 NIST 2009 Open Machine Translation (OpenMT) Evaluation
  • LDC2010T01 NIST Open MT 2008 Evaluation (MT08) Selected References and System Translations
  • LDC2010T05 NPS Internet Chatroom Conversations, Release 1.0
  • LDC2010V01 TRECVID 2004 Keyframes & Transcripts
  • LDC2010V02 TRECVID 2006 Keyframes
  • LDC2010S02 WTIMIT 1.0

2009

  • LDC2009S05 2007 NIST Language Recognition Evaluation Supplemental Training Set
  • LDC2009S04 2007 NIST Language Recognition Evaluation Test Set
  • LDC2009T12 2008 CoNLL Shared Task Data
  • LDC2009T05 2008 NIST Metrics for Machine Translation (MetricsMATR08) Development Data
  • LDC2009T29 ACL Anthology Reference Corpus
  • LDC2009L01 An English Dictionary of the Tamil Verb Second Edition
  • LDC2009T30 Arabic Gigaword Fourth Edition
  • LDC2009T22 Arabic Newswire English Translation Collection
  • LDC2009V01 Audiovisual Database of Spoken American English
  • LDC2009T04 BioProp Version 1.0
  • LDC2009T27 Chinese Gigaword Fourth Edition
  • LDC2009S01 CSLU: Numbers Version 1.3
  • LDC2009S03 CSLU: S4X Release 1.2
  • LDC2009T20 Czech Broadcast Conversation MDE Transcripts
  • LDC2009S02 Czech Broadcast Conversation Speech
  • LDC2009T01 English CTS Treebank with Structural Metadata
  • LDC2009T13 English Gigaword Fourth Edition
  • LDC2009T23 FactBank 1.0
  • LDC2009T28 French Gigaword Second Edition
  • LDC2009T03 GALE Phase 1 Arabic Newsgroup Parallel Text - Part 1
  • LDC2009T09 GALE Phase 1 Arabic Newsgroup Parallel Text - Part 2
  • LDC2009T02 GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 1
  • LDC2009T06 GALE Phase 1 Chinese Broadcast Conversation Parallel Text - Part 2
  • LDC2009T15 GALE Phase 1 Chinese Newsgroup Parallel Text - Part 1
  • LDC2009T08 Japanese Web N-gram Version 1
  • LDC2009T10 Language Understanding Annotation Corpus
  • LDC2009T26 NXT Switchboard Annotations
  • LDC2009T24 OntoNotes Release 3.0
  • LDC2009T11 REFLEX Entity Translation Training/DevTest
  • LDC2009T21 Spanish Gigaword Second Edition
  • LDC2009T14 Tagged Chinese Gigaword Version 2.0
  • LDC2009T07 Unified Linguistic Annotation Text Collection
  • LDC2009T25 Web 1T 5-gram, 10 European Languages <br>Version 1

2008

2007

  • LDC2007T22 2001 Topic Annotated Enron Email Data Set
  • LDC2007S10 2003 NIST Rich Transcription Evaluation Data
  • LDC2007S12 2004 Spring NIST Rich Transcription (RT-04S) Evaluation Data
  • LDC2007S11 2004 Spring NIST Rich Transcription (RT-04S) Development Data
  • LDC2007T40 Arabic Gigaword Third Edition
  • LDC2007S03 ARL Urdu Speech Database, Training Data
  • LDC2007T38 Chinese Gigaword Third Edition
  • LDC2007T36 Chinese Treebank 6.0
  • LDC2007S08 CSLU: Foreign Accented English Release 1.2
  • LDC2007S18 CSLU: Kids` Speech Version 1.1
  • LDC2007S13 CSLU: Apple Words and Phrases
  • LDC2007S05 CSLU: Yes/No Version 1.2
  • LDC2007T02 English Chinese Translation Treebank v 1.0
  • LDC2007T07 English Gigaword Third Edition
  • LDC2007S02 Fisher Levantine Arabic Conversational Telephone Speech
  • LDC2007T04 Fisher Levantine Arabic Conversational Telephone Speech, Transcripts
  • LDC2007T24 GALE Phase 1 Arabic Broadcast News Parallel Text - Part 1
  • LDC2007T23 GALE Phase 1 Chinese Broadcast News Parallel Text - Part 1
  • LDC2007T20 GALE Phase 1 Distillation Training
  • LDC2007T08 ISI Arabic-English Automatically Extracted Parallel Text
  • LDC2007T09 ISI Chinese-English Automatically Extracted Parallel Text
  • LDC2007S01 Levantine Arabic Conversational Telephone Speech
  • LDC2007T01 Levantine Arabic Conversational Telephone Speech, Transcripts
  • LDC2007S09 Mandarin Affective Speech
  • LDC2007T19 MITRE 1997 Mandarin Broadcast News Speech Translations (HUB-4NE)
  • LDC2007S15 Nationwide Speech Project
  • LDC2007T21 OntoNotes Release 1.0
  • LDC2007T03 Tagged Chinese Gigaword
  • LDC2007V02 TRECVID 2003 Keyframes & Transcripts
  • LDC2007V01 TRECVID 2005 Keyframes & Transcripts

2006

2005

  • LDC2005T09 ACE 2004 Multilingual Training Corpus
  • LDC2005T07 ACE Time Normalization (TERN) 2004 English Training Data v 1.0
  • LDC2005T35 American National Corpus (ANC) Second Release
  • LDC2005S07 Arabic CTS Levantine Fisher Training Data Set 3, Speech
  • LDC2005T03 Arabic CTS Levantine Fisher Training Data Set 3, Transcripts
  • LDC2005T02 Arabic Treebank: Part 1 v 3.0 (POS with full vocalization + syntactic analysis)
  • LDC2005T20 Arabic Treebank: Part 3 (full corpus) v 2.0 (MPG + Syntactic Analysis)
  • LDC2005T30 Arabic Treebank: Part 4 v 1.0 (MPG Annotation)
  • LDC2005S22 Articulation Index
  • LDC2005T33 BBN Pronoun Coreference and Entity Type Corpus
  • LDC2005S08 BBN/AUB DARPA Babylon Levantine Arabic Speech and Transcripts
  • LDC2005T13 CCGbank
  • LDC2005T34 Chinese <-> English Name Entity Lists v 1.0
  • LDC2005T10 Chinese English News Magazine Parallel Text
  • LDC2005T14 Chinese Gigaword Second Edition
  • LDC2005T06 Chinese News Translation Text Part 1
  • LDC2005T23 Chinese Proposition Bank 1.0
  • LDC2005T01 Chinese Treebank 5.0
  • LDC2005S26 CSLU: 22 Languages Corpus
  • LDC2005T08 Discourse Graphbank
  • LDC2005T12 English Gigaword Second Edition
  • LDC2005S13 Fisher English Training Part 2, Speech
  • LDC2005T19 Fisher English Training Part 2, Transcripts
  • LDC2005T28 HARD 2004 Text
  • LDC2005T29 HARD 2004 Topics and Annotations
  • LDC2005S15 HKUST Mandarin Telephone Speech, Part 1
  • LDC2005T32 HKUST Mandarin Telephone Transcript Data, Part 1
  • LDC2005S14 Levantine Arabic QT Training Data Set 4 (Speech + Transcripts)
  • LDC2005L01 Mawukakan Lexicon
  • LDC2005T05 Multiple-Translation Arabic (MTA) Part 2
  • LDC2005S16 RT-04 MDE Training Data Speech
  • LDC2005T24 RT-04 MDE Training Data Text/Annotations
  • LDC2005S25 Santa Barbara Corpus of Spoken American English Part IV
  • LDC2005S11 TDT4 Multilingual Broadcast News Speech Corpus
  • LDC2005T16 TDT4 Multilingual Text and Annotations
  • LDC2005S30 West Point Company G3 American English Speech
  • LDC2005S28 West Point Croatian Speech

2004

2003

2002

2001

  • LDC2001S91 1997 HUB4 Broadcast News Evaluation Non-English Test Material
  • LDC2001S97 2000 NIST Speaker Recognition Evaluation
  • LDC2001T55 Arabic Newswire Part 1
  • LDC2001T61 CALLHOME Spanish Dialogue Act Annotation
  • LDC2001T62 CETEMpublico
  • LDC2001T11 Chinese Treebank 2.0
  • LDC2001S16 Grassfields Bantu Fieldwork: Ngomba Tone Paradigms
  • LDC2001T02 Message Understanding Conference (MUC) 7
  • LDC2001T10 Prague Dependency Treebank 1.0
  • LDC2001S04 Speech in Noisy Environments (SPINE2) Part 1 Audio
  • LDC2001T05 Speech in Noisy Environments (SPINE2) Part 1 Transcripts
  • LDC2001S06 Speech in Noisy Environments (SPINE2) Part 2 Audio
  • LDC2001T07 Speech in Noisy Environments (SPINE2) Part 2 Transcripts
  • LDC2001S08 Speech in Noisy Environments (SPINE2) Part 3 Audio
  • LDC2001T09 Speech in Noisy Environments (SPINE2) Part 3 Transcripts
  • LDC2001S99 Speech in Noisy Environments 1 (SPINE1 CODED) Coded Audio
  • LDC2001S13 Switchboard Cellular Part 1 Audio
  • LDC2001S15 Switchboard Cellular Part 1 Transcribed Audio
  • LDC2001T14 Switchboard Cellular Part 1 Transcription
  • LDC2001T60 Syllable-Final /s/ Lenition
  • LDC2001S93 TDT2 Mandarin Audio Corpus
  • LDC2001T57 TDT2 Multilanguage Text Version 4.0
  • LDC2001S94 TDT3 English Audio
  • LDC2001S95 TDT3 Mandarin Audio
  • LDC2001T58 TDT3 Multilanguage Text Version 2.0

2000

  • LDC2000S86 1998 HUB4 Broadcast News Evaluation English Test Material
  • LDC2000S88 1999 HUB4 Broadcast News Evaluation English Test Material
  • LDC2000T43 BLLIP 1987-89 WSJ Corpus Release 1
  • LDC2000T50 Hong Kong Hansards Parallel Text
  • LDC2000T47 Hong Kong Laws Parallel Text
  • LDC2000T46 Hong Kong News Parallel Text
  • LDC2000T45 Korean Newswire
  • LDC2000S85 Santa Barbara Corpus of Spoken American English Part I
  • LDC2000S96 Speech in Noisy Environments (SPINE) Evaluation Audio
  • LDC2000T54 Speech in Noisy Environments (SPINE) Evaluation Transcripts
  • LDC2000S87 Speech in Noisy Environments (SPINE) Training Audio
  • LDC2000T49 Speech in Noisy Environments (SPINE) Training Transcripts
  • LDC2000S92 TDT2 Careful Transcription Audio
  • LDC2000T44 TDT2 Careful Transcription Text
  • LDC2000T52 TREC Mandarin
  • LDC2000T51 TREC Spanish
  • LDC2000S89 Voice of America (VOA) Czech Broadcast News Audio
  • LDC2000T53 Voice of America (VOA) Czech Broadcast News Transcripts

1999

1998

  • LDC98T31 1996 CSR HUB4 Language Model
  • LDC97S66 1996 English Broadcast News Dev and Eval (HUB4)
  • LDC97S44 1996 English Broadcast News Speech (HUB4)
  • LDC97T22 1996 English Broadcast News Transcripts (HUB4)
  • LDC98S71 1997 English Broadcast News Speech (HUB4)
  • LDC98T28 1997 English Broadcast News Transcripts (HUB4)
  • LDC98S73 1997 Mandarin Broadcast News Speech (HUB4-NE)
  • LDC98T24 1997 Mandarin Broadcast News Transcripts (HUB4-NE)
  • LDC98S74 1997 Spanish Broadcast News Speech (HUB4-NE)
  • LDC98T29 1997 Spanish Broadcast News Transcripts (HUB4-NE)
  • LDC98S76 1998 Speaker Recognition Benchmark
  • LDC98L21 COMLEX English Syntax Lexicon
  • LDC96T11 COMLEX Syntax Text Corpus Version 2.0
  • LDC95S23 CSR-III Speech
  • LDC95T6 CSR-III Text
  • LDC98S67 HTIMIT
  • LDC98S69 HUB5 Mandarin Telephone Speech Corpus
  • LDC98T26 HUB5 Mandarin Transcripts
  • LDC98S70 HUB5 Spanish Telephone Speech Corpus
  • LDC98T27 HUB5 Spanish Transcripts
  • LDC98T32 JURIS
  • LDC95S22 KING Speaker Verification
  • LDC98S68 LLHDB
  • LDC98T30 North American News Text Supplement
  • LDC98S75 Switchboard-2 Phase I
  • LDC98S72 Taiwanese Putonghua Speech and Transcripts
  • LDC98T25 TDT Pilot Study Corpus
  • LDC98S77 Voicemail Corpus Part I
  • LDC94S16 YOHO Speaker Verification

1997

1996

1995

1994

1993