Home › Language Resources › Data

HTIMIT

Item Name:	HTIMIT
Author(s):	Douglas Reynolds
LDC Catalog No.:	LDC98S67
ISBN:	1-58563-130-2
ISLRN:	866-042-083-505-7
DOI:	https://doi.org/10.35111/xk0c-xj95
Member Year(s):	1998
DCMI Type(s):	Sound
Sample Type:	1-channel pcm
Sample Rate:	8000
Data Source(s):	telephone speech
Application(s):	speech recognition, speaker identification
Language(s):	English
Language ID(s):	eng
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC98S67 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Reynolds, Douglas. HTIMIT LDC98S67. Web Download. Philadelphia: Linguistic Data Consortium, 1998.
Related Works: Hide	View isOutcomeOf LDC93S1 TIMIT Acoustic-Phonetic Continuous Speech Corpus isSimilarWith LDC93S2 NTIMIT LDC96S30 CTIMIT LDC96S32 FFMTIMIT LDC98S68 LLHDB LDC2008S03 STC-TIMIT 1.0 LDC2010S02 WTIMIT 1.0 LDC2017S04 Noisy TIMIT Speech

Introduction

The HTIMIT corpus is a re-recording of a subset of the TIMIT corpus through different telephone handsets. The aim was to create a corpus for the study of telephone transducer effects on speech which minimized confounding factors, such as variable telephone channels and background noise. HTIMIT was created by playing ten TIMIT sentences from 192 male and 192 females through a stereo loudspeaker into different transducers positioned directly in front of the loudspeaker and digitizing the output from the transducers. Ten (10) transducers (telephone handsets) were used. Most of these were not new; handsets with obvious damage were not used, but in order to obtain some diversity with a limited number of handsets, handsets were selected to have variable sound characteristics, transducer designs or, in the case of electrets, different grill designs. Further information about the handsets is provided in the corpus documentation.

Data

The collection procedure was not ideal with respect to realism of sound transduction, but it does allow for the collection of speech from a large number of speakers repeating identical speech on each instance. Furthermore, coupled with the phonetic markings from the original TIMIT corpus, HTIMIT offers the ability to study handset transducer effects on speech recognition systems.

To address the realism of the sound transduction in HTIMIT, a second corpus using the same handsets but with live people speaking into the handsets is also available. This corpus is called the Lincoln Laboratory Handset Database (LLHDB) LDC98S68.

Updates

There are no updates at this time.

HTIMIT

Introduction

Data

Updates

Available Media

View Fees