The text component of the CALLHOME German corpus package includes transcripts and documentation files. The transcripts cover contiguous five or ten minute segments taken from 100 unscripted telephone conversations between native speakers of German. The transcripts are timestamped by speaker turn for alignment with the speech signal and are provided in standard orthography.
In addition to transcript files, this corpus contains full documentation on the transcription conventions and format. Complete auditing information on the speakers represented in the transcripts (including gender, channel quality and so on) is also included.
This corpus is distributed throughout the LDC's FTP server.
The corpus of telephone speech (LDC97S43) is available separately, as well as an associated lexicon (LDC97L18).
For a list of updates, user reports, and other addenda, please go to LDC1997T15.
Updates There are no updates at this time.