Avatar Education Portuguese

LDC2018S15

Introduction

Avatar Education Portuguese, Linguistic Data Consortium (LDC) Catalog Number LDC2018S15 and ISBN 1-58563-864-1, was developed by the University of Pernambuco and consists of approximately 80 minutes of Brazilian Portuguese microphone speech with phonetic and orthographic transcriptions. The data was developed for Avatar Education, an animated virtual assistant designed to enhance communication and interaction in educational contexts, such as online learning.

Data

The corpus contains 1,400 utterances (700 male and 700 female) of read and spontaneous speech spoken by two professional speakers. Utterances were transcribed at the word level (without time alignments) and at the phoneme level (with time alignments labels).

The audio data was recorded at 16kHz (mono, 16-bit) using Pro Tools recording software and stored in flac compressed wav format. The acoustic environment was controlled for background conditions that occur in application environments.

Directory Structure

Please see file.tbl for a complete file list as well as checksums for this publication.

Updates

Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2018S15.

Content Copyright

Portions © 2018 Alexandre Magno Andrade Maciel, © 2018 Trustees of the University of Pennsylvania