File sro-specs.doc. - originally drawn from a memo by C. Hemphill of TI (4/18/90). - amended 07/91. - revised by L. Shriberg (11/10/91). - revised by Patti Price (12/09/91), - revised 01/21/92. - minor revisions made by J. Garofolo on 02/21/92. Please note: in the transcription data provided in this publication, no effort has been made to enforce consistency in the prosodic and non-speech markings as described below. Such markings are known to be somewhat subjective, and there may have been some evolutionary changes in usage over the period of data collection for this corpus. These issues, however, do not affect the lexical or referential content of the transcriptions. ATIS SR Output (".sro") Transcription Conventions The transcription is intended to be an orthographic, lexical transcription with a few details included that represent audible acoustic events (speech and nonspeech) present in the corresponding waveform files. The SRO transcriptions will be automatically mapped to lexical SNOR conventions for scoring of recognition systems. The extra marks contained in the SRO transcription aid in interpreting the text form of the utterance. The SRO transcription will be stored in the query's auxiliary file of type ".sro". The transcriptions are intended to be a quick and broad transcription; transcribers should not have to agonize over decisions, but rather realize that their transcription is intended to be a rough guide that others may examine further for details. Transcriptions should be made in two passes: one pass in which words are transcribed, and a second in which the additional details (extraneous noises, and prosodic marks) are added. Many phenomena (silences, noises, "uh"s) are easy to miss unless specifically attended to. It is recommended that transcribers have some background in phonetics and in linguistics, or that their training and preparation for the transcription task cover some basics in acoustic phonetics and dialect and style variations. 1. Markings Required for Scoring. 1.1 Case Transcriptions are case insensitive and all case information will be lost in the translation to the all uppercase SNOR conventions. Using all lower case for SRO conventions is recommended so that SRO files are immediately recognizable from SNOR and lexical SNOR files. 1.2 Spelling Normal lexical items will be represented by their spellings in the normal way. NIST maintains a common lexicon of spellings of words used in the ATIS corpus. It is available via remote FTP to ssi.ncsl.nist.gov and should be consulted when in doubt on spellings of words. The file is located in the directory, "madcow/logs" and is named, "lexicon.doc.DATE", where DATE represents the latest date of update of this file. Spellings which cannot be predicted from .sro conventions: - "all right" will always be used in lieu of "alright" - "traveling" will always be used in lieu of "travelling" - "trans world" will always be transcribed as separate words when referring to the airline, TWA. - "pan am" will always be transcribed as separate words when referring to the airline, Pan American Airlines. - "okay" is spelled "okay" rather than any other spellings, and should not be in angled or square brackets, unless part of a sequence that is verbally deleted. - hyphenation is addressed in a separate section below. 1.2.1 Number sequences Number sequences (flight numbers, times, dates, aircraft types, dollar amounts, etc.) will be spelled out to reflect what was said ("flight six one three"; "seven thirty"; "august twenty first"; "seven forty seven"; "four hundred and ten dollars".) Reminder: No hyphens will be used ("seven forty seven", not "seven forty-seven".) Note: care should be taken to transcribe the digit "0" as "zero" or "oh", depending on what the speaker said. 1.2.2 Letter sequences Letter sequences occur in acronyms and abbreviations ("d f w"; "a p slash eighty"; "p m"; "c o"; etc.) Letters should be in lower case, separated by a space. Note that the determiner "a" and the letter "a" in "t w a" are not distinguished in these conventions. Previous conventions indicated an exception to the above rule for "washington dc" in which there was no space between the "d" and "c". This exception never made sense and has not been used consistently in practice. In all future transcriptions it should NOT be treated as an exception and should always be transcribed as "washington d c". [NIST has changed all occurrences of "dc" to "d c" in the MADCOW data they have distributed, so the "dc" form has never been used in official MADCOW data. It may, however, exist in the ATIS0 data.] The AM and PM of times (e.g., "five thirty p m") will be treated as examples of letter sequences, i.e., lower case and separated by a space, with no periods. If a speaker pronounces as acronym or abbreviation as a word, for example "den" or "bos", then these should be spelled out as words, rather than as "d e n" and "b o s". 1.3 Hyphenation Hyphens will not generally be used; if the items on either side of a potential hyphen are both words, a space will be used instead of a hyphen. If one or both of the items is NOT a lexical item, neither a space nor a hyphen will be used, e.g., "nonstop" should be used, NOT "non-stop" or "non stop"; "round trip" should be used and NOT "round-trip"; "one way" should be used and NOT "one-way" or "oneway"; "nonsmoking" should be used and NOT "non-smoking". 1.4 Punctuation This transcription will not contain normal English punctuation and will consist of lowercase characters except for proper nouns and individual letters. Conventional punctuation, including commas, periods, and question marks, will not be used. Periods will be used to indicate silent pauses (see 2.2) within an utterance, and should only occur following a space. Commas are used to indicate intonational separation; exclamation points are used to indicate emphatic stress. Periods, question marks or exclamation points should NOT be used to indicate the end of a sentence. 1.5 Mispronunciations Obviously mispronounced words that are nevertheless intelligible will be marked with stars (e.g, *transportation* for ``transportetation''). These include mispronunciations such as words with extra or omitted syllables, but asterisks should not be used to indicate pronunciations of words that represent normal dialectal (e.g., "warshed" for "washed" or "cah" for "car" or stylistic variation (e.g., "bout" for "about" or "wanna" for "want a" or for "want to". If the speaker would not consider the pronunciation an error, the asterisk notation should not be used. Obviously, there may be some clear and some unclear cases; transcribers should use their best judgment. A background in phonetics is helpful for transcribers. Similarly, glottalization at onset or offset of a vowel are not transcribed. 1.6 Verbal Deletions Words verbally deleted by the subject will be enclosed in angle brackets. Verbal deletion means words spoken by the user but which, in the opinion of the transcriber, are superseded by subsequent speech explicitly (e.g., "show fares") or implicitly (e.g., "show me the flights to Boston". Verbal deletions occur any time there is a repetition or restart. In repetitions, one or more words are repeated, and there may or may not be extra material inserted into the repetition, for example: show me the flights to boston show me the nonstop flights to boston In restarts, words are not repeated, but the speaker changes direction, as in: how many flights go to boston Note that EACH word in a verbal deletion should be enclosed in angle brackets. 1.7 Word Fragments Word fragments, i.e. instances in which the speaker did not complete a word, will be marked with a hyphen. As much of the word as is audible will be transcribed, followed immediately by the hyphen: please show fli- flights from dallas Though these represent "verbal deletions" as described above, the hyphen occurring before (or after) a space is sufficient to cue this fact, and should not be enclosed in angle brackets, as this just adds work for the transcribers. That is, the above example should NOT be "please show flights from dallas" Fragments include cases in which only an initial consonant or vowel is heard: please show f- flights from dallas This may sometimes be a judgement call on the part of the transcriber. Within word hesitations may be transcribed as: dall:as (indicating lengthening of the "l") (see section 2.4) dal- [um] -las (indicating a within word interruption - rare) dal- . -as (indicating a silence interrupting a word) The transcription will specify the intended word if such is obvious to the transcribers and is NOT obvious from context (this is of course a judgement call on the part of the transcriber). The completion of the presumed intended word will be enclosed in parentheses, BEFORE the hyphen, as in: please show flights de(nver)- from dallas 1.8 Non-Speech Acoustic Events Acoustic events enclosed in square brackets can come from the following set: -Filled Pause ([uh], [um], [er], [ah], [mm]) -Speaker other ([laughter], [cough], [grunt], [throat_clear], [mumbling], [unintelligible]) -Nonspeaker other ([phone], [paper_rustle], [door_slam]) Note that while the exact specification of the type of acoustic event is subjective, these events MUST be marked in the correct location in a transcribed utterance. It is often difficult to localize these events; transcribing the utterance first, and listening for these events in a second pass is the correct procedure. Note that any term can be used inside the brackets, but there should be no spaces inside brackets; use an underscore to connect words. Note that the the filled pauses represent acoustic events similar acoustically and phonetically to speech. If possible, try to limit these to the set on the list, so that those interested in these events can find them easily. If others occur, contact the MADCOW committee via your MADCOW representative. For noise events that occur over a span of one or more words, the transcriber should: - indicate the beginning and ending of the noise, to the nearest word: "show the [paper_rustle/] flights to boston [/paper_rustle] or - indicate that the sound overlaps one word, e.g. a door slam during the word "flights" could be transcribed either: "show the [door_slam>] flights to boston or "show the flights [