README FILE FOR: Call My Net 1 LDC Catalog ID : LDC2024S05 Authors: Karen Jones, Kevin Walker, Dave Graff, Jonathan Wright, Stephanie Strassel 1.0 Introduction This release comprises the Call My Net 1 Corpus, containing conversational telephone speech recordings in 4 languages: Tagalog, Cebuano, Cantonese and Mandarin. Native speakers located in China and the Philippines were recruited to make 10 calls to people in their social networks, calling from a variety of noise conditions and handsets. Speaker demographic information and call metadata were collected and all recordings were manually audited to confirm language and speaker requirements. The corpus contains a total of 2472 2-channel calls, with 364.2 hours of recorded audio. Data from Call My Net 1 has been used in support of the 2016 NIST Speaker Recognition Evaluation (SRE16). 2.0 Release Contents The directory structure is as follows: README.txt -- this file data/ ceb cmn tgl yue docs/ call_lang_group.tab call_sides.tab CMN_Quality_Auditing_Guidelines.pdf CMN_Speaker_Auditing_Guidelines.pdf languages.tab quality_audits.tab speaker_audits.tab subjects.tab All audio files are presented as two channel 16-bit PCM FLAC with a sample rate of 8000 Hz. See Section 4 for a detailed description of the speaker, call and audit metadata provided in the six .tab files. 3.0 Collection 3.1 Protocol Native speakers of each language were recruited and enrolled as human subjects, providing informed consent and receiving compensation for their effort under a protocol approved by the University of Pennsylvania's Institutional Review Boards (IRB). These recruited subjects (known as callers) were required to make a minimum of 10 calls to friends and family members (known as callees), with each call lasting between 8 and 10 minutes. Both callers and callees provided consent prior to each recording. Both callers and callees were assigned a unique, persistent PIN number. For Tagalog and Cebuano, both caller and callee were located in the Philippines with the recording platform located in Sydney, Australia. For Cantonese and Mandarin, speakers were located in China with the recording platform in London. Tagalog and Cantonese were the primary collection languages, while Cebuano and Mandarin were secondary with fewer recruited speakers. The table below shows the number of callers per language appearing in the corpus. Language Code Callers Tagalog tgl 101 Cantonese yue 100 Cebuano ceb 10 Mandarin cmn 10 3.2 Recording Call collection was implemented on a robot-operator platform located in Sydney or London, connected to a digital E-1 trunk line with multiple channels for accessing the public telephone network. Recruited callers dialed into the platform and entered their unique PIN for verification then used the telephone keypad to enter information about their handset and noise condition, as well as the phone number for the callee. The platform then dialed out to the callee and both speakers provided consent to be recorded. The call was then bridged and recording began. For each channel, the digital stream from the E-1 ISDN line (8-bit a-law encoded audio at 8000 samples/second) was stored to a disk file on the platform, in addition to being passed through the circuit to the other call participant. Recording automatically terminated after 10 minutes. The two single-channel files for each call were uploaded to a file server at LDC, and subsequently combined into a single 2-channel audio file. Recorded audio is released as 2-channel 16-bit PCM FLAC files. All call sides associated with the recruited subject (i.e. the caller "A" call side) were subject to manual auditing to confirm language and overall recording quality, and to verify that all calls associated with a given caller PIN were made by the same speaker. Callee call sides (the "B" call side) were audited only in regard to overall recording quality. 3.3 Results A count of files, total size and audio duration for each language is given below: Language Code Recordings Hours Tagalog tgl 1236 166 Cantonese yue 1036 165.1 Cebuano ceb 100 16.6 Mandarin cmn 100 16.7 Total 2472 364.4 4.0 Metadata Call, speaker and audit metadata is provided in six tab-delimited files. The initial line in each file provides the column headings for the subsequent rows of data in the table. The columns of each table are summarized below: 4.1 call_lang_group.tab 1 file_id 2 language_id 3 call_group_label This table has one row for each call giving file_id, language_id. The call_group_label is provided to indicate when a set of consecutive short calls made by one caller to the same callee should be considered a single conversation; speakers occasionally were disconnected from the platform mid-call and needed to re-dial, calls with the same_group_label are considered part of the same multi-part conversation. 4.2 call_sides.tab 1 file_id 2 call_side 3 subj_id 4 phone_id 5 phone_category 6 phone_type 7 phone_mic 8 noise_condition 9 quality_comment This table has one row for every call side in the collection. It provides the subject_id label; the caller and callee telephone numbers used (with the last 4 digits encrypted into a 3-letter string for anonymity); and for the caller side only, self-reported information about the phone handset type, microphone type, and environmental noise condition. The phone_id is a unique, anonymized identifier for a given phone number. 4.3 languages.tab 1 language_id 2 language The language_id column provides the 3-letter iso code while the language column provides the language name. 4.4 quality_audit.tab 1 file_id 2 call_side 3 mostly_speech 4 speech_clarity 5 noisy 6 single_speaker 7 speaker_sex 8 comment 9 auditor_id This table presents the results of auditing conducted by the LDC to check the overall quality of the recordings. 4.5 speaker_audit.tab 1 file_id 2 call_side -- always "A" 3 ref_call -- "true" or "false" 4 expected_language -- "yes", "no" or "unsure" 5 expected_speaker -- "yes", "no", "unsure" or "NO_RESPONSE" 6 auditor_id This table presents the results of auditing conducted by the LDC to check speaker identification. For each speaker, one call side was presented to an auditor as the reference call, against which other call sides would be compared; this call side has its ref_call value set to "true" and expected_speaker set to "NO_RESPONSE" because no speaker ID decision was conducted for this reference call side. All non-reference call sides indicate ref_call = "false", and expected_speaker as "yes, no, unsure" reflecting the decision made during manual auditing. 4.6 subjects.tab 1 subject_id 2 sex 3 year_of_birth 4 native_language This table has one row for each speaker with self-reported demographic information provided by the speaker. The subject_id field is a continuation of the numeric sequence that has been applied to all previous CTS collections at the LDC. We are reasonably confident that none of the speakers in this collection have been involved in previous LDC speech collections. 5.0 Known Issues 5.1 Anomalous Signal Properties A. There was a relatively high incidence of call connection problems in the Philippines. If a call was cut off due to connection problems, speakers were permitted to reconnect to the recording platform to complete their conversation; this resulted in several cases of "multi-part" calls. Calls that have the same call_group_label in the call_lang_group.tab file are instances of these multi-part calls. B. Sporadic cases of unusually abrupt voice-onset and voice-offset were observed in some calls. Durations of mid-syllable dropouts range between approximately 15 and 30 milliseconds. This property appeared to affect both Tagalog and Cantonese, both A and B channels. 5.2 Auditing Anomalies A. For one call with audio_file_id 20141202_093141_10970, the quality audit indicated that this call side contains a different male speaker from the one found in all other call sides labeled with the same subject_id 131044. B. In 77 cases, auditing resulted in a response of "unsure" regarding the validity of the speaker identity, due to extreme distortion or noise in the signal. 6.0 References Jones, Karen, Stephanie M. Strassel, Kevin Walker, David Graff, and Jonathan Wright. "Call My Net Corpus: A Multilingual Corpus for Evaluation of Speaker Recognition Technology." In INTERSPEECH, pp. 2621-2624. 2017. – README 21 June 2023 -created by Karen Jones and Dave Graff -updated by Stephanie Strassel