Avatar Education Portuguese

Item Name: Avatar Education Portuguese
Author(s): Alexandre M. A. Maciel, Rodrigo L. Rodrigues, Danilo S. Barbosa
LDC Catalog No.: LDC2018S15
ISBN: 1-58563-864-1
ISLRN: 942-249-733-666-7
DOI: https://doi.org/10.35111/kpzv-2f80
Release Date: November 15, 2018
Member Year(s): 2018
DCMI Type(s): Sound, Text
Sample Type: flac
Sample Rate: 16000
Data Source(s): microphone speech
Application(s): speech recognition, speech synthesis
Language(s): Portuguese
Language ID(s): por
License(s): LDC User Agreement for Non-Members
Online Documentation: LDC2018S15 Documents
Licensing Instructions: Subscription & Standard Members, and Non-Members
Citation: Maciel, Alexandre M. A., Rodrigo Rodrigues, and Danilo Barbosa. Avatar Education Portuguese LDC2018S15. Web Download. Philadelphia: Linguistic Data Consortium, 2018.
Related Works: View


Avatar Education Portuguese was developed by the University of Pernambuco and consists of approximately 80 minutes of Brazilian Portuguese microphone speech with phonetic and orthographic transcriptions. The data was developed for Avatar Education, an animated virtual assistant designed to enhance communication and interaction in educational contexts, such as online learning.


The corpus contains 1,400 utterances (700 male and 700 female) of read and spontaneous speech spoken by two professional speakers. Utterances were transcribed at the word level (without time alignments) and at the phoneme level (with time alignment labels).

The audio data was recorded at 16kHz (mono, 16-bit) using Pro Tools recording software and stored in flac compressed wav format. The acoustic environment was controlled for background conditions that occur in application environments.


