Title Second Language University Speech Intelligibility Corpus (L2-USI) Author(s) Okim Kang, Kevin Hirschi, Stephen D. Looney, John H. L. Hansen Language(s) English The Second Language University Speech Intelligibility corpus was created by Northern Arizona University, The Pennsylvania State University, and University of Texas at Dallas. It consists of 10.5 hours of speech by 66 international faculty and university students (34 female, 32 male) from 15 different language backgrounds at 10 universities in North America. The corpus includes wav audio files, orthographic transcriptions for all recordings, and intelligibility scores for 73 per cent of the files. Aligned Praat Textgrids with word-level segmentation and pauses greater than 0.4 seconds are also included. Each recording is identified for four background variables: gender, L1, Type (IEP, ITA, Highly Intelligible), and two intelligibility scores. Recommended/Expected use of corpus This corpus can be used for investigations of L2 speech patterns within the academic context. The accompanying intelligibility scores per recording may also be used to uncover phonological / perceptual relationships or aid in determining mispronunciations that interfere with intelligibility. Collection Procedure - format, method, and timespan Data were collected in 2021 and 2022 during which 127 speech events were recorded by speakers at home and in classrooms. The recordings are all monologic and contain speakers’ presentations, descriptions, reflections, and microteaching tasks. Some include a practice recording and /or a final, in-class recording. Speakers were recruited from courses at Intensive English Programs (IEPs) and oral skills courses for international graduate students seeking to become International Teaching Assistants (ITAs). Directory Structure & File Format Specific Details data/ - directory includes flac audio files, orthographic transcriptions for all recordings and textgrids with word-level segmentation. The tag /filtered/ was used for redacted speech either by silence or a 1khz tone, and was used to mark words that could not be transcribed. Filled pauses are transcribed as “um,” “uh,” “eh”, and “mm.” docs/ - directory includes an excel file of the gender, L1 background, recording description, duration (in milliseconds), and intelligibility score, when available, of the files.