Mark (.mrk) file specifications, Updated 03/16/92: The mark files contain time-aligned word transcriptions. 1. In .mrk files, each record must have 4 fields; the first is the talker, A or B; the second is the estimated start time of the current word, the third is the duration of the word, and the fourth is the word. 2. The first field may contain the dummy symbol "*" when the event in field 4 is not attributed to either speaker. Examples: "[beep]" at the beginning indicating the dtmf tone, or the "..." at the end indicating that the conversation was cut off (timed out). * * * [Beep] @A 1.36 0.28 Okay, @A 1.64 0.08 I 3. Fields 2 and 3 may contain the dummy symbol "*" when the event in field 4 is not a word, but a nonspeech event, a comment, or a stand-alone string of punctuation (i.e., something which does not receive a duration from the time alignment algorithm), as well as the cases in (2) above. A * * [lipsmack] A * * {pause} A 311.02 0.62 economic 4. Fields 2 and 3 may contain the dummy symbol "*" when there is simultaneous speech by A and B. Here the time alignment algorithm is allowed to recognize the speech of EITHER talker--usually the louder one. The other talker's speech is not time aligned during this period, but the words should be attributed to A and B correctly. A 113.96 0.24 thing A 114.20 0.10 is A 114.30 0.44 still, A * * {pause} A * * #you A * * know# A * * -- B 116.40 0.20 #Your B 116.60 0.56 education.# A * * -- A 117.16 0.22 getting A 117.38 0.10 your A 117.48 0.60 education. 5. In field 1, "@" or "@@" may occur before A or B to mark suggested trim points for topicality purposes. However, we have REMOVED ALL DOUBLE ASTERISKS "**" from this location in both .mrk and .txt files. These originally signified that the two talkers' speech overlapped during this turn, without having to indicate exactly which words were involved. For short and simple episodes of overlap, transcribers were supposed to be more exact, placing pound signs "#" around the two simultaneously spoken phrases. See the previous example. This was not specific enough, either for some transcribers or for the time alignment program, so we began enforcing the stricter standard for all cases of simultaneous speech. Some of the released files still had the old asterisk notation when sent out in December. 6. In some of the more recently processed files, the start times of keywords found not to properly marked are preceded by "&&". This was done during a check of keywords after automatic marking had been completed. The ampersands were added if playing the keyword as marked produced less than a full keyword or parts of other words. The presence of silence or background noise before or after the keyword did not cause them to be added. 7. It should be borne in mind that these markings were created by an automatic procedure which did not model non-speech events, and were only intended to be approximate. Some marks, including some of keywords, are quite incorrect and may not overlap in time the actual word. NIST will provide official markings of the keywords in June 1992.