TORGO Database of Dysarthric Articulation was developed by the University of Torontos departments of Computer Science and Speech Language Pathology in collaboration with the Holland-Bloorview Kids Rehabilitation Hospital in Toronto, Canada. It contains approximately 23 hours of English speech data, accompanying transcripts and documentation from 8 speakers (5 males, 3 females) with cerebral palsy (CP) or amyotrophic lateral sclerosis (ALS) and from 7 speakers (4 males, 3 females) from a non-dysarthric control group.
CP and ALS are examples of dysarthria which is caused by disruptions in the neuro-motor interface that distort motor commands to the vocal articulators, resulting in atypical and relatively unintelligible speech in most cases. The TORGO database is primarily a resource for developing advanced automatic speaker recognition (ASR) models suited to the needs of people with dysarthria, but it is also applicable to non-dysarthric speech. The inability of modern ASR to effectively understand dysarthric speech is a problem since the more general physical disabilities often associated with the condition can make other forms of computer input, such as computer keyboards or touch screens, difficult to use.
The data consists of aligned acoustics and measured 3D articulatory features from the speakers carried out using the 3D AG500 electro-magnetic articulograph (EMA) system (Carstens Medizinelektronik GmbH, Lenglern, Germany) with fully-automated calibration. This system allows for 3D recordings of articulatory movements inside and outside the vocal tract, thus providing a detailed window on the nature and direction of speech-related activity.
The data was collected between 2008 and 2010 in Toronto, Canada. All subjects read text consisting of non-words, short words and restricted sentences from a 19-inch LCD screen. The restricted sentences included 162 sentences from the sentence intelligibility section of Assessment of intelligibility of dysarthric speech (Yorkston & Beukelman, 1981) and 460 sentences derived from the TIMIT database. The unrestricted sentences were elicited by asking participants to spontaneously describe 30 images in interesting situations taken randomly from Webber Photo Cards - Story Starters (Webber, 2005), designed to prompt students to tell or write a story.
Data is organized by speaker and by the session in which each speaker recorded data. Each speaker was assigned a code and given their own file directory. The code for female speakers begins with F, and the code for male speakers begins with M. If the speaker was a member of the control group, the letter C follows the gender code. The last two digits of the code indicate the order in which that subject was recruited. For example, speaker FC02 was the second female speaker without dysarthria recruited. Note that some speakers were intentionally left out of the data, and thus, there are gaps in the numbering.
Each speakers directory contains Session directories which encapsulate data recorded in the respective visit and occasionally, a Notes directory which can include Frenchay assessments (test for the measurement, description and diagnosis of dysarthria), notes about sessions (e.g., sensor errors), and other relevant notes.
Each Session directory can, but does not necessarily, contain the following content:
- alignment.txt: This is a text file containing the sample offsets between audio files recorded simultaneously by the array microphone and the head-worn microphone.
- amps: These directories contain raw *.amp and *.ini files produced by the AG500 articulograph.
- phn_*: These directories contain phonemic transcriptions of audio data. Each file is plain text with a *.PHN file extensions and a filename referring to the utterance number. These files were generated using the free Wavesurfer tool.
- pos: These directories contain the head-corrected positions, velocities, and orientations of sensor coils for each utterance, as generated by the AG500 articulograph.
- prompts: These directories contain orthographic transcriptions.
- rawpos: These directories are equivalent to the pos directories except that their articulographic content is not head-normalized to a constant upright position.
- wav_*: These directories contain the acoustics. Each file is a RIFF (little-endian) WAVE audio file (Microsoft PCM, 16 bit, mono 16000 Hz).
- wavall: These directories contains a stereo recording in which one channel contains the recorded acoustics and the other channel contains the analog peaks associated with the sweep signal, which is used by the AG500 hardware for synchronization.
Additionally, sessions recorded with the AG500 articulograph are marked with the file EMA, and those recorded with the video-based system are marked with the file VIDEO. Files with a date form as the filename and a txt extension (e.g. april232008cal2.txt, jan28cal3.txt) are the measured responses from calibration. The *.log and *.calset files contain descriptions of the calibration process, but not the final result of calibration.
See the readme file and the AG500 Wiki for more complete descriptions of the possible subfolders and of the AG500 specific files. Also, see session_contents.tsv for a tab separated table of each sessions subfolders and metadata files.
For an example of the data contained in this corpus, review these two audio samples: Dysarthric & Control.
None at this time.
Portions © 2008-2011 Frank Rudzicz, © 2012 Trustees of the University of Pennsylvania