The subdirectories of this publication are as follows:

data/		directory containing a subdirectory for each caller (each PIN)

data/pppp/	directory for the caller with the four-digit PIN pppp
                This directory contains a subdirectory for each session. 

data/pppp/ss	directory containing files associated with session ss
                (ss is a two-digit zero-padded session number) for the caller
                with the four-digit PIN pppp 

data/pppp/ss/pppp_ss_aa_bb_yyyymmdd_<description>.<ext>, where 

	<aa> is scenario number associated with the call (see Notes below)
	<bb> is site id
	<yyyymmdd> is the date of the call
	<description> can be 
		"human" for human transcript
		"raw" for raw log file
		"annotated" for annotated log file
		"survey" for survey corresponding to the call (see Notes below)
		"summary" for log summary of the call
		"sys" system side of the call
		"usr" caller side of the call
		"sysXXX-YYY" system utterance XXX-YYY
		"usrZZZ" caller utterance ZZZ
	<ext> can be
		"xml" for xml file
		"sph" for sphere format file (missing in the initial release)
		"txt" for text file
                "hyp" for the hypothesis files intended to be used with sclite
                "ref" for the reference files intended to be used with sclite

doc/		directory containing the documentation related to this project

dtd/		directory containing any dtd used in this project

for_sclite/	directory containing scripts to clean up the transcriptions,
                along with various .txt files and data files related to sclite.
		See for_sclite/readme.txt for further details.  Final cleaned
		transcriptions reside in the  for_sclite/snor  directory.
                Output from sclite is in the  for/sclite/snor/by_*
                subdirectories--also see the  for_sclite/snor/sclite_scripts

html/           directory containing html for web pages accessed by subjects
                 (For security reasons, NIST does not plan to distribute the
                  unmodified PERL cgi scripts that these pages invoke)

html/scenarios  the travel-task scenarios

tables/         directory containing tables
		 Subject demographics and most summarized data is
                 in tables here -- see tables_readme.txt

misc/		directory containing keys that map site directories
		to NIST directories

==============================================================================


Notes on scenario identifier numbers.
  1. Each user was to make nine calls.  The first seven calls had
     an assigned travel task scenario, which the user got via the
     web.  The actual html files for these scenarios are in this
     distribution.  The last two calls asked the user to make
     simulated travel reservations for a trip that they might
     wish to take -- a vacation or pleasure trip on the eighth call,
     and a business trip paid for by an employer on the ninth call.
     
  2. There was a framework or template for each of the nine scenarios.
     The template included dates, times, one-way vs. round-trip, the
     presence or absence of an airline preference, and so forth.  The
     template also included a profile of the cities/airports that
     occurred in the scenario.  The city/airport profile specified
     roughly how busy the airport was, as well as foreign vs. domestic,
     and also specified whether nonstop flights were possible or whether
     connecting flights would be necessary.

  3. Into the seven fixed scenario templates, NIST substituted new
     [allegedly similar] cities/airports each day of the data collection.
     There were eight such sets (the same set was used on June 22 and 23).  

  4. Thus, for purposes of data analysis, we consider the template identity
     to be of interest and which day's cities/airports may be of interest.

     The scenario identifier number is two digits.  The "tens" digit
     specifies the template number, and was always equal to the session
     number.  The "ones" digit specifies which set of cities/airports
     for that template.  For the seven fixed scenarios, this ones digit
     will takes values of 1 though 8.  For the two free scenarios (the
     eighth and ninth calls), the ones digit is always a 1.  The ones
     digit was a 2 for calls occurring on June 22 and June 23.
     
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on the survey data.
  The fields are all on one line, separated by multiple spaces.
  The user comment follows on one or more lines, or if the user
  made no comments then that fact is noted.

  The fields of the survey data are as follows.
    The site name
    The scenario identifier number
    The approximate time the recording ended
          (this is actually the time of an event in the NIST software,
           rather than a timestamp from the recording)
    The date the recording was made
          (the user may have gotten the scenario on one day and made
           the call on a later day, in which case the scenario identifier
           number will incorrectly correspond to the day the call was made) 
    Alive or Dead, as stated subjectively by the user on the survey
          (If the user indicated the system was completely silent/dead, then
           the rest of the survey items, everything below, will be omitted,
           which is the whole purpose of this survey item -- i.e., omitting
           the other survey items is preferred to users assigning random or
           neutral values if a system was completely silent/dead.)  This
           determination actually needs to also be made independently by us,
           on the basis of the audio recordings.

    Task completion as stated by the user on the survey, (Yes or No)
          (The actual item that the user responded to was,
           "Were you able to complete your entire task?")

    The responses to the five Likert-type items, with
          1 being the most favorable response (Agree Completely) and
          5 being the most negative (Disagree Completely).
          The Likert-type items are as follows (in this order)

              In this conversation, it was easy to get the information
              that I wanted.

              In this conversation, I found it easy to understand what
              the system said.

              In this conversation, I knew what I could say or do at
              each point in the dialogue.

              The system worked the way I expected it to in this
              conversation

              Based on my experience in this conversation using this
              system to get travel information, I would like to use
              this system regularly.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on the data, in general.
   The user with PIN 3118 made an extra call to Colorado.  On his first
   call he failed to fill out the registration form.  We rescheduled him
   with Colorado, and he repeated the Colorado call (session 4 for him)
   after completing all nine sessions.  All data for both of these calls
   is included here -- note that their dates and scenario numbers differ.

   The user with PIN 4330 was said to have a foreign accent.  The user
   with PIN 3815 is a replacement for this user.

   The user with PIN 8471 was said to have a foreign accent.  The user
   with PIN 1592 is a replacement for this user.

   The user with PIN 7562 had impaired articulation, although this was 
   not evident at first.  The user with PIN 3714 is a replacement for
   this user.

   The user with PIN 5845 clearly made a session 1 call to Colorado.
   Colorado has no data for that call.  Thus, some of the usual files
   are missing for the call.

   In a few cases, we have modified the user's survey response saying
   that a system was dead/silent to say that the system was Alive, if
   an actual conversation clearly did take place.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on the .hyp and .ref files in release_1  (released October 2000)

   Several bugs were corrected in the extraction of the transcript from the
   log summary.  The original extracted .hyp and .ref files are in data/ 
   directory.  The final .hyp and .ref files, cleaned up to conform with
   the Communicator transcription spec reside in for_sclite/snor directory.

   The notes below on hand repairs of two calls still apply.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on the .hyp and .ref files in release_0  also known as comm0
(released August 2, 2000) 

   In general, if there was no human transcription of user turns, then there
   are no .hyp or .ref files here

   All .hyp and .ref files in the initial data release (August 2, 2000)
   are unchecked and very preliminary.  NIST has not yet attempted to
   run sclite on them.  We know some are missing and expect that some
   are incorrect.  Thus, these files WILL change, and errors found in 
   the .hyp and .ref files in the initial release need not be reported.
   We're including them because we think useful analysis can be done on
   even this preliminary data. An updated release of them will follow.

   The automated log tools failed on two of the calls, and NIST resorted
   to hand repairs.

     * The .hyp and .ref files for call/session 01 for the user with
       PIN 7158 required some hand-modifications.  It is not clear 
       to NIST that the raw.xml log for this call is entirely
       correct/self-consistent (for example, user turn 9 seems to be
       tagged as the end of the task at line 5969 of the file).  NIST
       believes the .hyp and .ref files in this release for this call
       to be correct now.

     * The .hyp and .ref files for call/session 08 for the user with
       PIN 3108 represent hand-reconstruction, and the time stamps on
       the user turn in these files are self-consistent but are probably
       incorrect.  The .hyp and .ref files in this release for this call 
       will be used with sclite.

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

Notes on the audio files.

   All audio files in this distribution are in SPHERE format.  The
   files are consisted of the sites' recordings and the NIST recordings.
   The sites' recordings are utterance level while the NIST recordings
   are channel level.  For example, in 1087/01/

   1087_01_18_01_20000707_usr001.sph corresponds to the site's recording
				     of the user utterance #1
   1087_01_18_01_20000707_sys001.sph corresponds to the site's recording
				     of the system utterance #1
   1087_01_18_01_20000707.sph	     corresponds to the summation of the
				     NIST recordings of the system and 
				     user sides

   No echo cancellation was performed on the sites' or the NIST recordings.

   We had designed the CDCS handshaking synchronization process to
   generate signals we could use synchronizing the NIST and site audio
   recordings.  As it turned out, only four of the nine Communicator-2000
   sites chose to implement the process.  To further complicate the process,
   the log standard was never updated to incorporate the timestamping of
   these tones, so the sites who did generate the sync tones did not 
   generate timestamps in their logs which would tell us when these tones
   occurred.  Given the incompleteness of the data, we chose to scrap the
   use of the sync tones entirely and instead perform an indirect 
   synchronization of the logs and audio files that we could apply to all
   of the sites' data via the following protocol:

   For each call,

      1) a random user utterance containing a significant length of speech
	 from the audio recording submitted by the site for that call was
	 chosen, and the start and end timestamps for that utterance were
	 obtained from the log.   Note that we used the user utterances
	 because not all sites recorded their system utterances.

      2) the selected utterance was auto-correlated with the NIST user
	 recording.  The result provided the offset into the NIST user
	 recording where the selected site audio recording file occurred.
	 The timestamp for the site user recording was then used to infer
	 the start timestamp for the NIST user recording relative to the
	 logfile.

      3) the NIST user and system recordings were then combined into a
	 two-channel audio file.  However, because of an apparent delay
	 between thread communications in the NIST data collection system,
	 we found that the NIST user and system recording processes were
	 not started at exactly the same time.  We quantified the delay by
	 correlating the system side speech with its echo (caused by
	 crosstalk) in the user side speech recordings.  The channel that
	 was found to start the latest was then prepended with silence
	 by the length of the delay to effectively cause the two channels
	 to begin at the same absolute time.

	 Note that this process did not comprehend any possible delays
	 caused by telephone transmission distances. However, we 
	 assumed these to be negligible.

      4) the start time for the NIST recordings relative to the site log
	 was then calculated by subtracting the offset obtained in step 2
	 from the start timestamp of the utterance chosen in step 1 and
	 taking into account any additional silence added in step 3.  
	 The start time is recorded in log_audio_sync_xx.txt (where xx
	 is the number of the CD-ROM the recording is located on).

	 The product of the process was a two-channel audio file with
	 synchronized call-length system and user side audio and a start
	 time for synchronization of this file with the corresponding
	 site log for each call. 

   The procedure described above should have completely aligned the
   NIST audio with the logfile timestamps (and site audio).  The
   offset for each utterance in the NIST recording would be obtained
   by subtracting the value of the logfile GC_OPERATION OR GC_MESSAGE
   tag's stime attribute when type_new_utt="user" OR type_new_turn="user" 
   (we should make this consistent) from the NIST audio start time.
   However, in listening, these offsets are obviously incorrect for
   some sites (sites 01, 05, 06, and to a lesser degree sites 04 and 07).
   (Note that we could not perform ANY processing on site 08's data since
   they did not provide us with a mapping between their log file and
   their own audio files.)

   After some investigation, we found that the logfile timestamps don't
   actually map to the site audio files.  For example, the timestamps for
   one logfile indicated that the duration for a particular utterance was
   1.5 seconds, but the duration of the site's audio file was actually 22
   seconds.  Closer examination revealed that the actual speech in the file
   measured 1.5 seconds in length, but the file also contained 20.5 
   seconds of silence.  This finding lead us to suspect that the timestamps
   indicated where the speech started and ended for each utterance relative
   to the entire session and NOT where the site audio recordings fit in the
   session timeline.  To compensate for the problem in our alignment,
   we manually cut out the initial silence from a selected utterance in
   EVERY call and then re-ran the automatic alignment procedure described
   above.

   (Note that this, unfortunately, means that there is no foolproof 
   automatic way to index into the site audio recordings using the 
   logfile timestamps.  This should definitely be corrected in future
   data collection efforts.)

   The above procedure seemed to correct the misalignment problem for
   all of the sites for the user recordings (except for site 06 whose 
   timestamps appeared to have an additional problem that we were unable
   to diagnose.)  

   For 01, 03, 04, 05, 06, 07, and 09 (all sites except 02), we found, 
   however, that the sites' own user-side and system-side timestamps were 
   not synchronized with each other.  This is also a serious problem 
   that should be corrected in future data collections.  In the end, 
   we synchronized on the user side audio, so sites will note (sometimes
   severe) timing problems in the system audio.  This problem will be
   quite evident if you play back both the NIST system and user audio
   at the same time or if you try to "recreate" the call using the site
   audio and logfile timestamps.

   Another problem that we found is that for site 05 data, the duration
   indicated by the timestamps is "shorter" than the actual speech.
   Although we tried to model the problem, it seemed to vary among
   utterances and calls.  We, therefore, did not attempt to modify the
   timestamps to reflect the true duration of the speech.