The subdirectories of this publication are as follows: data/ directory containing a subdirectory for each caller (each PIN) data/pppp/ directory for the caller with the four-digit PIN pppp This directory contains a subdirectory for each session. data/pppp/ss directory containing files associated with session ss (ss is a two-digit zero-padded session number) for the caller with the four-digit PIN pppp data/pppp/ss/pppp_ss_aa_bb_yyyymmdd_., where is scenario number associated with the call (see Notes below) is site id is the date of the call can be "human" for human transcript "raw" for raw log file "annotated" for annotated log file "survey" for survey corresponding to the call (see Notes below) "summary" for log summary of the call "sys" system side of the call "usr" caller side of the call "sysXXX-YYY" system utterance XXX-YYY "usrZZZ" caller utterance ZZZ can be "xml" for xml file "sph" for sphere format file (missing in the initial release) "txt" for text file "hyp" for the hypothesis files intended to be used with sclite "ref" for the reference files intended to be used with sclite doc/ directory containing the documentation related to this project dtd/ directory containing any dtd used in this project for_sclite/ directory containing scripts to clean up the transcriptions, along with various .txt files and data files related to sclite. See for_sclite/readme.txt for further details. Final cleaned transcriptions reside in the for_sclite/snor directory. Output from sclite is in the for/sclite/snor/by_* subdirectories--also see the for_sclite/snor/sclite_scripts html/ directory containing html for web pages accessed by subjects (For security reasons, NIST does not plan to distribute the unmodified PERL cgi scripts that these pages invoke) html/scenarios the travel-task scenarios tables/ directory containing tables Subject demographics and most summarized data is in tables here -- see tables_readme.txt misc/ directory containing keys that map site directories to NIST directories ============================================================================== Notes on scenario identifier numbers. 1. Each user was to make nine calls. The first seven calls had an assigned travel task scenario, which the user got via the web. The actual html files for these scenarios are in this distribution. The last two calls asked the user to make simulated travel reservations for a trip that they might wish to take -- a vacation or pleasure trip on the eighth call, and a business trip paid for by an employer on the ninth call. 2. There was a framework or template for each of the nine scenarios. The template included dates, times, one-way vs. round-trip, the presence or absence of an airline preference, and so forth. The template also included a profile of the cities/airports that occurred in the scenario. The city/airport profile specified roughly how busy the airport was, as well as foreign vs. domestic, and also specified whether nonstop flights were possible or whether connecting flights would be necessary. 3. Into the seven fixed scenario templates, NIST substituted new [allegedly similar] cities/airports each day of the data collection. There were eight such sets (the same set was used on June 22 and 23). 4. Thus, for purposes of data analysis, we consider the template identity to be of interest and which day's cities/airports may be of interest. The scenario identifier number is two digits. The "tens" digit specifies the template number, and was always equal to the session number. The "ones" digit specifies which set of cities/airports for that template. For the seven fixed scenarios, this ones digit will takes values of 1 though 8. For the two free scenarios (the eighth and ninth calls), the ones digit is always a 1. The ones digit was a 2 for calls occurring on June 22 and June 23. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Notes on the survey data. The fields are all on one line, separated by multiple spaces. The user comment follows on one or more lines, or if the user made no comments then that fact is noted. The fields of the survey data are as follows. The site name The scenario identifier number The approximate time the recording ended (this is actually the time of an event in the NIST software, rather than a timestamp from the recording) The date the recording was made (the user may have gotten the scenario on one day and made the call on a later day, in which case the scenario identifier number will incorrectly correspond to the day the call was made) Alive or Dead, as stated subjectively by the user on the survey (If the user indicated the system was completely silent/dead, then the rest of the survey items, everything below, will be omitted, which is the whole purpose of this survey item -- i.e., omitting the other survey items is preferred to users assigning random or neutral values if a system was completely silent/dead.) This determination actually needs to also be made independently by us, on the basis of the audio recordings. Task completion as stated by the user on the survey, (Yes or No) (The actual item that the user responded to was, "Were you able to complete your entire task?") The responses to the five Likert-type items, with 1 being the most favorable response (Agree Completely) and 5 being the most negative (Disagree Completely). The Likert-type items are as follows (in this order) In this conversation, it was easy to get the information that I wanted. In this conversation, I found it easy to understand what the system said. In this conversation, I knew what I could say or do at each point in the dialogue. The system worked the way I expected it to in this conversation Based on my experience in this conversation using this system to get travel information, I would like to use this system regularly. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Notes on the data, in general. The user with PIN 3118 made an extra call to Colorado. On his first call he failed to fill out the registration form. We rescheduled him with Colorado, and he repeated the Colorado call (session 4 for him) after completing all nine sessions. All data for both of these calls is included here -- note that their dates and scenario numbers differ. The user with PIN 4330 was said to have a foreign accent. The user with PIN 3815 is a replacement for this user. The user with PIN 8471 was said to have a foreign accent. The user with PIN 1592 is a replacement for this user. The user with PIN 7562 had impaired articulation, although this was not evident at first. The user with PIN 3714 is a replacement for this user. The user with PIN 5845 clearly made a session 1 call to Colorado. Colorado has no data for that call. Thus, some of the usual files are missing for the call. In a few cases, we have modified the user's survey response saying that a system was dead/silent to say that the system was Alive, if an actual conversation clearly did take place. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Notes on the .hyp and .ref files in release_1 (released October 2000) Several bugs were corrected in the extraction of the transcript from the log summary. The original extracted .hyp and .ref files are in data/ directory. The final .hyp and .ref files, cleaned up to conform with the Communicator transcription spec reside in for_sclite/snor directory. The notes below on hand repairs of two calls still apply. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Notes on the .hyp and .ref files in release_0 also known as comm0 (released August 2, 2000) In general, if there was no human transcription of user turns, then there are no .hyp or .ref files here All .hyp and .ref files in the initial data release (August 2, 2000) are unchecked and very preliminary. NIST has not yet attempted to run sclite on them. We know some are missing and expect that some are incorrect. Thus, these files WILL change, and errors found in the .hyp and .ref files in the initial release need not be reported. We're including them because we think useful analysis can be done on even this preliminary data. An updated release of them will follow. The automated log tools failed on two of the calls, and NIST resorted to hand repairs. * The .hyp and .ref files for call/session 01 for the user with PIN 7158 required some hand-modifications. It is not clear to NIST that the raw.xml log for this call is entirely correct/self-consistent (for example, user turn 9 seems to be tagged as the end of the task at line 5969 of the file). NIST believes the .hyp and .ref files in this release for this call to be correct now. * The .hyp and .ref files for call/session 08 for the user with PIN 3108 represent hand-reconstruction, and the time stamps on the user turn in these files are self-consistent but are probably incorrect. The .hyp and .ref files in this release for this call will be used with sclite. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Notes on the audio files. All audio files in this distribution are in SPHERE format. The files are consisted of the sites' recordings and the NIST recordings. The sites' recordings are utterance level while the NIST recordings are channel level. For example, in 1087/01/ 1087_01_18_01_20000707_usr001.sph corresponds to the site's recording of the user utterance #1 1087_01_18_01_20000707_sys001.sph corresponds to the site's recording of the system utterance #1 1087_01_18_01_20000707.sph corresponds to the summation of the NIST recordings of the system and user sides No echo cancellation was performed on the sites' or the NIST recordings. We had designed the CDCS handshaking synchronization process to generate signals we could use synchronizing the NIST and site audio recordings. As it turned out, only four of the nine Communicator-2000 sites chose to implement the process. To further complicate the process, the log standard was never updated to incorporate the timestamping of these tones, so the sites who did generate the sync tones did not generate timestamps in their logs which would tell us when these tones occurred. Given the incompleteness of the data, we chose to scrap the use of the sync tones entirely and instead perform an indirect synchronization of the logs and audio files that we could apply to all of the sites' data via the following protocol: For each call, 1) a random user utterance containing a significant length of speech from the audio recording submitted by the site for that call was chosen, and the start and end timestamps for that utterance were obtained from the log. Note that we used the user utterances because not all sites recorded their system utterances. 2) the selected utterance was auto-correlated with the NIST user recording. The result provided the offset into the NIST user recording where the selected site audio recording file occurred. The timestamp for the site user recording was then used to infer the start timestamp for the NIST user recording relative to the logfile. 3) the NIST user and system recordings were then combined into a two-channel audio file. However, because of an apparent delay between thread communications in the NIST data collection system, we found that the NIST user and system recording processes were not started at exactly the same time. We quantified the delay by correlating the system side speech with its echo (caused by crosstalk) in the user side speech recordings. The channel that was found to start the latest was then prepended with silence by the length of the delay to effectively cause the two channels to begin at the same absolute time. Note that this process did not comprehend any possible delays caused by telephone transmission distances. However, we assumed these to be negligible. 4) the start time for the NIST recordings relative to the site log was then calculated by subtracting the offset obtained in step 2 from the start timestamp of the utterance chosen in step 1 and taking into account any additional silence added in step 3. The start time is recorded in log_audio_sync_xx.txt (where xx is the number of the CD-ROM the recording is located on). The product of the process was a two-channel audio file with synchronized call-length system and user side audio and a start time for synchronization of this file with the corresponding site log for each call. The procedure described above should have completely aligned the NIST audio with the logfile timestamps (and site audio). The offset for each utterance in the NIST recording would be obtained by subtracting the value of the logfile GC_OPERATION OR GC_MESSAGE tag's stime attribute when type_new_utt="user" OR type_new_turn="user" (we should make this consistent) from the NIST audio start time. However, in listening, these offsets are obviously incorrect for some sites (sites 01, 05, 06, and to a lesser degree sites 04 and 07). (Note that we could not perform ANY processing on site 08's data since they did not provide us with a mapping between their log file and their own audio files.) After some investigation, we found that the logfile timestamps don't actually map to the site audio files. For example, the timestamps for one logfile indicated that the duration for a particular utterance was 1.5 seconds, but the duration of the site's audio file was actually 22 seconds. Closer examination revealed that the actual speech in the file measured 1.5 seconds in length, but the file also contained 20.5 seconds of silence. This finding lead us to suspect that the timestamps indicated where the speech started and ended for each utterance relative to the entire session and NOT where the site audio recordings fit in the session timeline. To compensate for the problem in our alignment, we manually cut out the initial silence from a selected utterance in EVERY call and then re-ran the automatic alignment procedure described above. (Note that this, unfortunately, means that there is no foolproof automatic way to index into the site audio recordings using the logfile timestamps. This should definitely be corrected in future data collection efforts.) The above procedure seemed to correct the misalignment problem for all of the sites for the user recordings (except for site 06 whose timestamps appeared to have an additional problem that we were unable to diagnose.) For 01, 03, 04, 05, 06, 07, and 09 (all sites except 02), we found, however, that the sites' own user-side and system-side timestamps were not synchronized with each other. This is also a serious problem that should be corrected in future data collections. In the end, we synchronized on the user side audio, so sites will note (sometimes severe) timing problems in the system audio. This problem will be quite evident if you play back both the NIST system and user audio at the same time or if you try to "recreate" the call using the site audio and logfile timestamps. Another problem that we found is that for site 05 data, the duration indicated by the timestamps is "shorter" than the actual speech. Although we tried to model the problem, it seemed to vary among utterances and calls. We, therefore, did not attempt to modify the timestamps to reflect the true duration of the speech.