Home › Language Resources › Data

Middle East Technical University Turkish Microphone Speech v 1.0

Item Name:	Middle East Technical University Turkish Microphone Speech v 1.0
Author(s):	Ozgul Salor, Tolga Ciloglu, Bryan Pellom, Mubeccel Demirekler
LDC Catalog No.:	LDC2006S33
ISBN:	1-58563-384-4
ISLRN:	461-254-833-604-1
DOI:	https://doi.org/10.35111/sk8b-ss58
Release Date:	May 18, 2006
Member Year(s):	2006
DCMI Type(s):	Sound
Sample Type:	pcm
Sample Rate:	16000
Data Source(s):	microphone speech
Application(s):	speech recognition
Language(s):	Turkish
Language ID(s):	tur
License(s):	LDC User Agreement for Non-Members
Online Documentation:	LDC2006S33 Documents
Licensing Instructions:	Subscription & Standard Members, and Non-Members
Citation:	Salor, Ozgul, et al. Middle East Technical University Turkish Microphone Speech v 1.0 LDC2006S33. Web Download. Philadelphia: Linguistic Data Consortium, 2006.

Introduction

Middle East Technical University Turkish Microphone Speech v 1.0 was developed at Middle East Technical University (METU) and contains text, speech, and alignment files for approximately 5.6 hours of recorded Turkish. The corpus was part of a collaborative work between METU's Department of Electrical and Electronics Engineering and the Center for Spoken Language Research (CSLR) at the University of Colorado at Boulder. The collaboration was supported by TUBITAK, the Scientific and Technical Research Council of Turkey, through a combined doctoral scholarship program. The corpus was used to port CSLR's speech recognition system, SONIC, to Turkish.

Data

The corpus contains text, speech and alignment files. The corpus is of size ~600 MB. 120 speakers (60 male and 60 female) speak 40 sentences each (aproximately 300 words per speaker). The 40 sentences are selected randomly for each speaker from a triphone-balanced set of 2,462 Turkish sentences. The speakers are selected from students, faculty, and staff at METU and all are native speakers of Turkish. The age range is from 19 to 50 years with an average of 23.9 years.

The data has been digitally recorded with a Sound Blaster sound card on a PC at a 16 kHz sampling rate.

Samples

For an example of the data in this corpus, please listen to this audio sample (WAV) and view its companion transcript (TXT).

Updates

None at this time.

Middle East Technical University Turkish Microphone Speech v 1.0

Introduction

Data

Samples

Updates

Copyright

Available Media

View Fees