1.      Title:            Russian through Switched Telephone Network (RuSTeN)

 

 

2.      Authors:          Anrey Raev, Serguei Koval, Natalia Smirnova, Daria     

                           Khitrova, Vitaly Stepanov.

                              tel:+7 812 3258848 (STC)

                              raev/koval/nsmirnova/khitrova@speechpro.com

 

3.      Data Type:    speech

 

4.      Data Source: telephone. Collected between 2001 and 2003 at Speech Technology

Center (STC).

 

5.      Project:           Trawl” (Automatic Voice Identification System in Telephone Channel). The purpose of the project was to develop software for automatic identification of speakers based on voice samples acquired through telephone channels. The training of the system was performed with the telephone speech corpus RuSTeN. The project is completed by now. The “Trawl” software has 91% far-end speaker identification reliability, provided the duration of the sample recording is at least 16 sec and that of the questioned recording – at least 96 sec.

 

6.      Application:   speaker identification

 

7.      Language:     Russian (RUS)

 

8.      Special license:

9.      Copyright:     Portions © 2001 Speech Technology Center Limited

10. Description:       

·        Speech in the wav-format

·        627 speech files, 2 doc-files. Uncompressed.

·        recorded at sample rate 11025 Hz, 16-bit, linear, 1-channel. The corpus’ size is 4,5 Gb (abt 60 hrs). Audio files can be played in any sound editor.

·        The corpus is recorded on 1 DVD.

·        Directories content:

 

                        |_DOC ____

                        |            | readme.doc

                        |            | speakers.doc

                        |            |__________

                        |

                        |_DATA____

                                     | 00101.wav

                                     | 00102.wav

                                     | 00103.wav

                                     | 00104.wav

                                     | 00105.wav

                                     | 00201.wav

                                     | 00202.wav

                                     |   ......

                                     | 16705.wav

                                     |__________

                        

 

The Doc directory on CD1 contains general information about the corpus (readme.doc), procedure specification (procedure.doc) and information about the speakers (speakers.doc) – sex, age, education, place where born and raised, current residence).

The Data directories contain speech files in the wav-format. The filenames are read as FFFSS, where FFF stands for the far-end speaker code and SS for the session number. The codes were given to the recruited speakers before or during the first call. However, some speakers failed to record the required minimum of 5 sessions. Besides, the quality of the recorded material was not always acceptable. In all such cases the recordings were not included into the corpus. This accounts for the fact that certain speakers and/or sessions are missing.

The Data directories on each CD contain: 

CD1 – 17 speakers (001-017)

CD2 – 18 speakers (018-035)

CD3 – 18 speakers (036-039,041-048, 050-053, 055, 057)

CD4 – 20 speakers 058-061, 064-066, 069-075, 077-079, 082, 086, 088)

CD5 – 21 speaker (092-094, 099-101, 103-107, 109-111, 113-119)

CD6 – 22 speakers (120-125, 127, 129-133, 138-140, 143-148, 151)

CD7 – 9 speakers (152, 154, 155, 157, 158, 161, 164, 166, 167)

 

Quality control:   

                              Some of the files were erroneously recorded at sample frequency 8 kHz, in particular: 07904, 08604, 08605, 08801, 08802, 09201, 09301, 10005.

 

Suggested price:1,000 USD