Russian through Switched Telephone Network (RuSTeN)

Item Name: Russian through Switched Telephone Network (RuSTeN)
Author(s): Anrey Raev, Serguei Koval, Natalia Smirnova, Daria Khitrova, Vitaly Stepanov
LDC Catalog No.: LDC2006S34
ISBN: 1-58563-388-7
ISLRN: 301-264-944-856-8
DOI: https://doi.org/10.35111/bw5g-8741
Release Date: July 21, 2006
Member Year(s): 2006
DCMI Type(s): Sound
Sample Type: 1-channel pcm
Sample Rate: 11025
Data Source(s): telephone conversations
Application(s): speaker identification, speaker verification, speech recognition
Language(s): Russian
Language ID(s): rus
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2006S34 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Raev, Anrey, et al. Russian through Switched Telephone Network (RuSTeN) LDC2006S34. Web Download. Philadelphia: Linguistic Data Consortium, 2006.
Related Works: View

Introduction

Russian through Switched Telephone Network (RuSTeN) was developed by the Speech Technology Center (STC) and consists of approximately 56 hours of Russian telephone speech.

This corpus was developed as part of the Automatic Voice Identification System in Telephone Channel project. The purpose of the project was to develop software for automatic identification of speakers based on voice samples acquired through telephone channels. System training was performed with the RuSTeN corpus.

Data

The RuSTeN database was recorded between March 2001 and February 2003 by Speech Technology Center (STC) using the "forget-me-not" professional telephone recording and archiving software package developed by STC.

Each of the speakers made at least five calls from different locations and/or telephone sets. Most of the calls were made from home or an office environment with uncontrolled noise level. Additionally, one call per speaker was made from a public telephone (with either street or metro station noise in the background). The recordings are spontaneous (sometimes guided by the near-end speaker) conversations between the caller and the speech database collector on various subjects (the weather, the caller's biography, hobbies, etc.) and include approximately 150 seconds of the far-end and at least five seconds of the near-end speaker. Besides, each time the caller was asked to utter the usual digits set (0-9) and the words "yes" and "no."

The time interval between two successive sessions is at least two days. The database contains 125 speakers (far-end), 58 male and 67 female. Further demographic information can be found in the associated documentation.

Each far-end speaker is represented by at least five speech files. The sound files were recorded in wav format with sample frequency 11,025 Hz, one-channel, 16-bit linear. The speech filenames contain the following information: FFF (far-end speaker number) and SS (session number).

Samples

For an example of the data in this corpus, please listen to this sample (WAV).

Updates

None at this time.

Available Media

View Fees





Login for the applicable fee