CALLHOME German Second Edition

Item Name: CALLHOME German Second Edition
Author(s): Alexandra Canavan, David Graff, George Zipperlen, Krisjanis Karins, Robert MacIntyre, Monika Brandmair, Susanne Lauscher, Cynthia McLemore, Neville Ryant, Danni Ma
LDC Catalog No.: LDC2026S06
ISLRN: 870-604-419-329-9
DOI: https://doi.org/10.35111/zt3v-yr24
Release Date: May 15, 2026
Member Year(s): 2026
DCMI Type(s): Sound, Text
Sample Type: 16-bit FLAC
Sample Rate: 8000
Data Source(s): telephone conversations
Project(s): Hub5-LVCSR
Application(s): speaker identification, speech recognition
Language(s): German
Language ID(s): deu
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2026S06 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Canavan, Alexandra, et al. CALLHOME German Second Edition LDC2026S06. Web Download. Philadelphia: Linguistic Data Consortium, 2026.
Related Works: View

Introduction

CALLHOME German Second Edition was developed by the Linguistic Data Consortium (LDC) and contains approximately 48 hours of speech from 100 unscripted telephone conversations between native German speakers. This publication is a re-release of the original CALLHOME German collection, combining CALLHOME German Speech (LDC97S43) and CALLHOME German Transcripts (LDC97T15), with additional transcription and updated directory structure, file formats, and documentation.

The CALLHOME series consists of telephone conversations and transcripts developed by LDC and Rutgers, The State University of New Jersey, in support of research in speaker identification, language identification and related technologies. Languages in the series include American English, Egyptian Arabic, German, Japanese, Mandarin Chinese, and Spanish.

Data

This release contains the 100 telephone conversations published in CALLHOME German Speech (LDC97S43) which represented training data (80 calls) and development data (20 calls). Calls originated in North America and were placed to locations overseas. Most participants called family members or close friends. Participants spoke on topics of their choice in a single telephone call lasting up to 30 minutes. Calls were manually audited for language, recording quality, channel characteristics, dialect, and accent.

The audio was originally recorded as 8 kHz u-law SPHERE files compressed with SHORTEN. For this Second Edition, all audio was converted to FLAC format. Recordings are provided as 8 kHz, 16-bit, two-channel FLAC files. The original training/development partitioning was removed. All files appear in a unified directory.

Transcripts are provided as UTF-8 encoded TSV files in WebTrans format (LDC's standard transcription tool). Two versions are included: (1) the transcripts published in CALLHOME German Transcripts (LDC97T15); and (2) revised transcripts conforming to updated LDC transcription guidelines. The latter include normalization of annotation formats, standardization of speaker-produced and background noises, application of foreign-language marking, whitespace cleanup, and corrections and consistency fixes.

Provided metadata includes call-level information (background noise, distortion, crosstalk), speaker metadata (accent, age, sex, comments), and demographic information for call initiators (age, education level).

Samples

Please view these samples:

Updates

No updates at this time.

Available Media

View Fees





Login for the applicable fee