The following reserved metrics values have the following meanings: -1 mathmatically undefined (for example, division by zero) -2 not applicable (Examples: * Likert responses when system is dead * std. dev. of user utt durations if no utts * all metrics that depend on utterance durations when a type_end_utt is missing) -3 missing from log file, or required data on which it is based was missing from the logfile. -4 value is/was suspect or useless for some logical reason (You can think of -4 as meaning the metrics program chose to omit the value for some logical reason.) Note that these reserved values are all negative. None of the metrics listed below ever has a negative normal value. Standard deviations are not reported if calculated on fewer than 3 data values (that is, not reported unless the n-1 divisor is at least 2). According to our notes, the evaluation committee requested that we calculate and report the following twelve metrics for each call. Task completion (yes/no) as reported by sites in the logfiles Number of on-task turns (sys+user) to completion Number of on-task user turns Number of on-task system turns Number of on-task system utterances Time to completion Number of words in the on-task user utterances Mean words per on-task user turn Mean duration of on-task user turn Mean duration of on-task system turn Mean duration of on-task system utterance Traditional NIST word error rate We are also reporting metrics for user utterances and are reporting standard deviations that correspond to the means. At the New Orleans PI meeting, we were asked to add some analysis for overlaps/parallelism, and we have done so. NOTE on some "extra" metrics that may indicate degree of user initiative: Systems often wait extremely long times if a user remains silent. So, the MaxDurUsrUtt is then very long and consists of the user remaining silent. This seems to happen quite often, perhaps in the majority of calls. In analyzing the June 2000 data, some effort was made to use mean duration of user utterances as a measure of the degree of user initiative. Although metrics based on this analysis are not core metrics, they are interesting. But long waits for silent users can swamp the useful information about initiative that can be gotten from such data. We've tried to remedy this situation to some degree: we're reporting several extra values that describe the mean duration and standard deviation of user utterances and turns that result if one omits the utterance or turn with the max number of words. And if the MaxDurOfAUsrUtt and DurOfMaxWrdsUserUtt (see metrics heading descriptions below) differ from each other by more than 20%, then the values of the extra metrics that depend on DurOfMaxWrdsUsrUtt and DurOfMaxWrdsUsrTurn are reported as -4. 20% was heuristically chosen as the largest difference that still seems to give useful information. Many difference values were tried over the past few months and the results of each were examined. We did not initially intend to report any of these extra metrics but decided to report the 20% set as maybe giving useful information. Feel free to ignore them (columns BF through BH and columns AR through AT in the spreadsheet). Parallel values are given for system utterances and turns, which do not seem to suffer any similar "swamping of information" phenomena, and they too can be ignored if you wish. These are all "extra" metrics. We are reporting them so that people can assess their value as automatically- generated metrics that might help indicate degree of mixed-initiative. ----------------------------------------------------------------------------- The metric headings (columns AD-CL in the spreadsheet) are: Abbreviation Unit Meaning ----------------------------------------------------------------------------- ATC 0=not complete Task completion, annotated by site 1=complete (CMU always marked this as 0) TimeOnTask seconds Time on-task, from beginning of first on-task utterance to end last on-task utterance (was Time To Completion) TurnsOnTask count Number of on-task turns (system+user) (was Turns To Completion) NumOverlaps count Number of overlaps; pieces of time attributed to both a user utt and a sys utt SumOverlaps seconds Sum of overlapped time ----------------------------------------------------------------------------- NumUsrTurns count Number of on-task user turns (was User Turns to Completion) MeanUsrTurnDur seconds Mean on-task user turn duration NumWrdsUsrTurns words Number of words in on-task user turns (was User Words to Completion) StdDevUsrTurnDur seconds Std. dev. on-task user turn durations MeanWrdsPerUsrTurn words Mean words per on-task user turn (was Mean User Words Per Turn) StdDevWrdsPerUsrTurn words Std. dev. words per on-task user turn MaxWrdsInAUsrTurn words Maximum words in an on-task user turn MeanWrdsPerUsrTurn2 words Mean words per on-task user turn, omitting the usr turn whose words were counted as MaxWrdsInAUsrTurn StdDevWrdsPerUsrTurn2 words Std. dev. words per on-task user turn omitting the MaxWrdsInAUsrTurn turn DurOfMaxWrdsUsrTurn seconds Duration of the MaxWrdsInAUsrTurn turn (see note, above, on extra metrics) MeanUsrTurnDur2 seconds Mean duration of on-task user turns, omitting the MaxWrdsInAUsrTurn turn (see note, above, on extra metrics) StdDevUsrTurnDur2 seconds Std. dev. on-task user turn durations, omitting the MaxWrdsInAUsrTurn turn (see note, above, on extra metrics) ----------------------------------------------------------------------------- NumUsrUtts count Number of on-task user utt texts NumUsrUttStartEndPairs count Number of on-task user utt start-time/end-time pairs MeanUsrUttDur seconds Mean on-task user utt durations StdDevUsrUttDur seconds Std. dev. on-task user utt durations NumWrdsUsrUtts count Number of words in on-task user utts MeanWrdsPerUsrUtt words Mean words per on-task user utt StdDevWrdsPerUsrUtt words Std. dev. words per on-task user utt MaxWrdsInAUsrUtt words Maximum words in any on-task user utt MeanWrdsPerUsrUtt2 words Mean words per on-task user utt, without the MaxWrdsInAUsrUtt utterance StdDevWrdsPerUsrUtt2 words Std. dev. words per on-task user turn without the MaxWrdsInAUsrUtt utterance MaxDurUsrUtt seconds Longest duration of an on-task usr utt DurOfMaxWrdsUsrUtt seconds Duration of MaxWrdsInAUsrUtt utterance (See note, above, on extra metrics. MaxDurUsrUtt and DurOfMaxWrdsUsrUtt are the two metrics whose ratio was mentioned in that note.) MeanUsrUttDur2 seconds Mean duration of on-task user utts, without DurOfMaxWrdsUsrUtt utterance (see note, above, on extra metrics) StdDevUsrUttDur2 seconds Std. dev. on-task user utt durations, without DurOfMaxWrdsUsrUtt utterance (see note, above, on extra metrics) NumUsrUttsInSysUtts count Num of on-task user utts completely embedded within on-task sys utts. An example of this embedding is when a user tries to barge-in in the middle of a system utterance but the system does not yield: so the user started speaking after the system started speaking and then the user stopped speaking before the system stopped speaking. (No user turn will be counted for these) ----------------------------------------------------------------------------- NumSysTurns count Number of on-task system turns (was System Turns to Completion) MeanSysTurnDur seconds Mean on-task system turn duration StdDevSysTurnDur seconds Std. dev. on-task sys turn durations NumWrdsSysTurns words Number words in on-task system turns (was System Words to Completion) MeanWrdsPerSysTurn words Mean words per on-task system turn (was Mean System Words Per Turn) StdDevWrdsPerSysTurn words Std. dev. words per on-task sys turn MaxWrdsInASysTurn words Maximum number of words in an on-task system turn MeanWrdsPerSysTurn2 words Mean words per on-task system turn, without the system turn with max words StdDevWrdsPerSysTurn2 words Std. dev. words per on-task sys turn, without the system turn with max words DurOfMaxWrdsSysTurn seconds Duration of that on-task system turn with max words MeanSysTurnDur2 seconds Mean duration of on-task system turns, without the system turn with max words StdDevSysTurnDur2 seconds Std. dev. on-task sys turn durations, without the system turn with max words ----------------------------------------------------------------------------- NumSysUtts count Number of on-task system utt texts NumSysUttStartEndPairs count Number of on-task system utt start-time/end-time pairs MeanSysUttDur seconds Mean on-task system utt duration StdDevSysUttDur seconds Std. dev. on-task system utt durations NumWrdsSysUtts count Number of words in on-task system utts MeanWrdsPerSysUtt words Mean words per on-task system utt StdDevWrdsPerSysUtt words Std. dev. words per on-task system utt MaxWrdsInASysUtt words Maximum words in an on-task system utt MeanWrdsPerSysUtt2 words Mean words per on-task system utt, omitting that utt with max words StdDevWrdsPerSysUtt2 words Std. dev. words per on-task sys turn, omitting the sys utt with max words DurOfMaxWrdsSysUtt seconds Max duration of an on-task system utt MeanSysUttDur2 seconds Mean duration of on-task system utts, omitting the sys utt with max words StdDevSysUttDur2 seconds Std. dev. on-task sys utt durations, omitting the sys utt with max words NumSysUttsInUsrUtts count Num of on-task system utts completely embedded within on-task user utts. This is the opposite situation from NumUserUttsInSysUtts. (no system turn is counted for these) ----------------------------------------------------------------------------- WER percent Word Error rate over the on-task user utterances SER percent Utterance Error rate over the on-task user utterances (sentence error rate from SCLite) -----------------------------------------------------------------------------