Avatar Education Portuguese, Linguistic Data Consortium (LDC) Catalog Number LDC2018S15 and ISBN 1-58563-864-1, was developed by the University of Pernambuco and consists of approximately 80 minutes of Brazilian Portuguese microphone speech with phonetic and orthographic transcriptions. The data was developed for Avatar Education, an animated virtual assistant designed to enhance communication and interaction in educational contexts, such as online learning.
The corpus contains 1,400 utterances (700 male and 700 female) of read and spontaneous speech spoken by two professional speakers. Utterances were transcribed at the word level (without time alignments) and at the phoneme level (with time alignments labels).
The audio data was recorded at 16kHz (mono, 16-bit) using Pro Tools recording software and stored in flac compressed wav format. The acoustic environment was controlled for background conditions that occur in application environments.
Please see file.tbl for a complete file list as well as checksums for this publication.
Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2018S15.
Portions © 2018 Alexandre Magno Andrade Maciel, © 2018 Trustees of the University of Pennsylvania