United Nations Proceedings Speech

Item Name: United Nations Proceedings Speech
Author(s): Kevin Chay, Cecilia Elizalde, Michal Ziemski
LDC Catalog No.: LDC2014S08
ISBN: 1-58563-693-2
ISLRN: 527-011-778-815-0
Release Date: October 15, 2014
Member Year(s): 2014
DCMI Type(s): Sound
Sample Type: flac
Sample Rate: 22050
Data Source(s): microphone speech
Application(s): speech recognition, language identification
Language(s): English, Mandarin Chinese, Standard Arabic, French, Russian, Spanish
Language ID(s): eng, cmn, arb, fra, rus, spa
License(s): United Nations Proceedings Speech
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Chay, Kevin, Cecilia Elizalde, and Michal Ziemski. United Nations Proceedings Speech LDC2014S08. Hard Drive. Philadelphia: Linguistic Data Consortium, 2014.

Introduction

United Nations Proceedings Speech was developed by the United Nations (UN) and contains approximately 8,500 hours of recorded proceedings in the six official UN languages, Arabic, Chinese, English, French, Russian and Spanish. The data was recorded in 2009-2012 from sessions 64-66 of the General Assembly (GA) and First Committee (FC) (Disarmament and International Security), and meetings 6434-6763 of the Security Council.

Recordings were made using a customized system following a daily internal circulated instruction from the Meetings Management Section. Most of the subjects and information related to a particular meeting or session are published in a UN Journal which can be found in the following link: http://www.un.org/en/documents/journal.asp

Data

Data is presented either as mp3 or flac compressed wav and are 16-bit single channel files in either 22,050 or 8,000 Hz organized by committee and session number, then language. The folder labeled "Floor" indicates the microphone used by the particular speaker. Those files may include other languages, for instance, if the speaker's language was not among the six official UN languages.

File naming conventions for GA and FC data are in the form of LYY_ZZ_format.format and Security Council data is in the form of LYYYY_ZZ_format.format where L is a one letter language designation, YY is the meeting number, ZZ indicates the audio segment number and format.format is the wav or mp3 designation. Note that not all files are present for every language.

Samples

Please listen to the following samples

Updates

None at this time.

 

Available Media

View Fees





Login for the applicable fee