Polish Speech Database

Item Name: Polish Speech Database
Author(s): Tomasz Szwelnik, Jacek Kawalec, Dorota Gutowska
LDC Catalog No.: LDC2019S19
ISBN: 1-58563-903-6
ISLRN: 803-554-461-385-1
DOI: https://doi.org/10.35111/twqh-f096
Release Date: October 15, 2019
Member Year(s): 2019
DCMI Type(s): Sound, Text
Sample Type: pcm
Sample Rate: 16000
Data Source(s): microphone speech
Application(s): speech recognition
Language(s): Polish
Language ID(s): pol
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2019S19 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Szwelnik, Tomasz, Jacek Kawalec, and Dorota Gutowska. Polish Speech Database LDC2019S19. Web Download. Philadelphia: Linguistic Data Consortium, 2019.


Polish Speech Database was developed by VoiceLab. It consists of 263,424 utterances of Polish speech data from 200 speakers, totaling approximately 280 hours, and corresponding transcripts.

Data collection was performed in Poland. Speakers were asked to record themselves for at least 60 minutes from their home computer using a headset while reading text on a website. The text was comprised of sentences covering most speech sounds in Polish.

The database includes speaker metadata. There were 103 male speakers and 97 female speakers. Their ages ranged from 15 years to 60 years of age. Most were in the 15-30 years age range.


Speech data is presented as 16,000 Hz, 16-bit, single channel, flac compressed wav files. Transcripts are UTF-8 encoded plain text.


Please view the following samples.


None at this time.

Available Media

View Fees

Login for the applicable fee