TORGO Database of Dysarthric Articulation

LDC2012S02

Introduction

TORGO Database of Dysarthric Articulation, Linguistic Data Consortium (LDC) catalog number LDC2012S02 and ISBN 1-58563-603-7, was developed by the University of Toronto's departments of Computer Science and Speech Language Pathology in collaboration with the Holland-Bloorview Kids Rehabilitation Hospital in Toronto, Canada. It contains approximately 23 hours of English speech data, accompanying transcripts and documentation from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric control group.

CP and ALS are examples of dysarthria which is caused by disruptions in the neuro-motor interface that distort motor commands to the vocal articulators, resulting in atypical and relatively unintelligible speech in most cases. The TORGO database is primarily a resource for developing advanced automatic speaker recognition (ASR) models suited to the needs of people with dysarthria, but it is also applicable to non-dysarthric speech. The inability of modern ASR to effectively understand dysarthric speech is a problem since the more general physical disabilities often associated with the condition can make other forms of computer input, such as computer keyboards or touch screens, difficult to use.

Data

The data consists of aligned acoustics and measured 3D articulatory features from the speakers carried out using the 3D AG500 electro-magnetic articulograph (EMA) system (Carstens Medizinelektronik GmbH, Lenglern, Germany) with fully-automated calibration. This system allows for 3D recordings of articulatory movements inside and outside the vocal tract, thus providing a detailed window on the nature and direction of speech-related activity.

The data was collected between 2008 and 2010 in Toronto, Canada. All subjects read text consisting of non-words, short words and restricted sentences from a 19-inch LCD screen. The restricted sentences included 162 sentences from the sentence intelligibility section of Assessment of intelligibility of dysarthric speech (Yorkston & Beukelman, 1981) and 460 sentences derived from the TIMIT database. The unrestricted sentences were elicited by asking participants to spontaneously describe 30 images in interesting situations taken randomly from Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt students to tell or write a story.

Data is organized by speaker and by the session in which each speaker recorded data. Each speaker was assigned a code and given their own file directory. The code for female speakers begins with 'F', and the code for male speakers begins with 'M'. If the speaker was a member of the control group, the letter 'C' follows the gender code. The last two digits of the code indicate the order in which that subject was recruited. For example, speaker 'FC02' was the second female speaker without dysarthria recruited. Note that some speakers were intentionally left out of the data, and thus, there are gaps in the numbering.

Each speaker's directory contains 'Session' directories which encapsulate data recorded in the respective visit and occasionally, a 'Notes' directory which can include Frenchay assessments (test for the measurement, description and diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other relevant notes.

Each 'Session' directory can, but does not necessarily, contain the following content:

Additionally, sessions recorded with the AG500 articulograph are marked with the file 'EMA', and those recorded with the video-based system are marked with the file 'VIDEO'. Files with a date form as the filename and a txt extension (e.g. april232008cal2.txt, jan28cal3.txt) are the measured responses from calibration. The *.log and *.calset files contain descriptions of the calibration process, but not the final result of calibration.

See the readme file and the AG500 Wiki for more complete descriptions of the possible subfolders and of the AG500 specific files. Also, see session_contents.tsv for a tab separated table of each session's subfolders and metadata files.

Directory Structure

Please see file.tbl for a complete file list as well as checksums for this publication.

Updates

Additional information, updates, bug fixes may be available in the LDC catalog entry for this corpus at LDC2012S02.

Content Copyright

Portions © 2008-2011 Frank Rudzicz, © 2012 Trustees of the University of Pennsylvania.