Transcription Spec:

  • A: corresponding to the local channel, (the lower waveform window)
  • B: corresponding to the remote channel. (the top waveform window)
  • A "turn" has a speaker channel identification, and has a beginning and end timestamp.

  • The insertion of "breakpoints" has the same appearance as a new speaker turn. Breakpoints can be inserted wherever they seem convenient to the transcriber. They should occur at the natural boundaries of speech, such as pauses, breaths, etc. Do not insert a breakpoint (timestamp) in the middle of a word! The time stamp has both a start and end point, and neither point can overlap a previous timestamp of the same speaker. Punctuation The following punctuation marks are to be used in the transcripts. The punctuation marks are primarily for ease of (human) reading. Use only those punctuation marks indicated below. Do not use marks such as single quotation (' '), exclamation ('!') or apostrophe (') other than those given below.

  • periods "." should be added at the end of declarative sentences
  • question marks "?" should be added at the end of interrogative sentences
  • commas "," should be added between clauses as is accepted in the standard orthography of the language

    Symbols

  • Acronyms I: are pronounced as a single word and should be written in caps (no spaces) and preceded by a "@" symbol:@AIDS

  • Acronyms II: are normally written as a single word but pronounced as a sequence of individual letters and should be written in all caps (no spaces) and preceded by a "~" symbol: ~FBI

  • Individual letters: are pronounced as such and should be written in caps and proceded by a "~" symbol:

  • Proper names: Both proper names and place names should be marked with a "^"symbol. If you encounter a "proper name phrase", mark only those words as proper names that are true proper names on their own.

  • Partial words: are indicated with a dash week-(without any spacing between the dash and the word):

  • If a word is mispronounced (such as a slip of the tongue), provide the correct spelling of the word, and place a "+" symbol in front of the word.

  • Idiosyncratic words

    If a speaker uses a "made-up" word which is not used by other speakers (although it may be understandable), place a "*" symbol before the word. Consult your language leader in cases where you are uncertain whether a word fits in this category. Onomatopoeia also fits into this category.

    Interjections

  • Use one from a set of standardized spellings for interjections. When it is hard to determine how to represent the interjection, ask your language leader.

  • English interjections as transcribed in English.


    mhm
    uh-huh
    uh-oh
    whoa
    whew
    yeah
    jeeze

    Non-lexemes

  • In addition to the interjections (which are considered to be words), we also have a set of standardized spellings for hesitation sounds that speakers make while talking. Every such "non word" in the transcripts is marked with the "%" symbol.

  • English non-lexemes (to give you an idea of the criterion for lexemes and non-lexemes.)


    %ach
    %ah
    %eee
    %eh
    %ew
    %ha
    %hee
    %huh
    %hm
    %huh
    %um
    %uh
    %oh

  • Noises

    In order to account for sound phenomena such as distortion, coughs, breaths, unintelligible speech, foreign words and phrases, etc, we utilize a set of unique brackets.

    {Text} Sound made by the talker. Use only those sounds described below: {laugh} {cough} {sneeze} {breath} {lipsmack}

    Sound not made by the talker (usually background or channel). This notation should be used only in those rare cases where the background condition is overwhelming.

    Use only those descriptions provided below: [distortion] [static] -- used for channel noise such as "buzzes", "pops", etc. [background] -- used for other noises such as children crying, pots being struck, etc. There may be many instances of a brief channel noise, such as intermittent [static] or [background] noises. You can ignore these occurences. The focus of these transcriptions are areas of speech, so there is no need to be overly concerned with small distortions. Similarly, if a speaker is stuttering, or starts to speak with a series of partial, hesititant words which have been individually timestamped, include the partial speech into a larger speech section.

    [text/] [/text] Marks when sound not made by the talker is non-instantaneous. Place this at the beginning and end of the noisy region. These tags are channel specific, and therefore the tag can cross turn changes if the sound is extended.

    Other Conventions

  • ((text)) Unintelligible speech. This is the transcriber's best guess.

  • (( )) Unintelligible speech (one or more words) that you cannot even make a guess at (with a single space between the parentheses).

  • English (enclosed in triangle brackets) This is used to indicate speech (one or more words) in another language. In place of "language", write the name of the language, if known. This can overlap with the (( )) notation above. If the language is recognized and can be transcribed, use the notation. If the language is recognized but cannot be transcribed, use . If the language is not even recognized, use just the (( )) notation as above.

  • <as/> text </as> This is used to mark an aside made by the primary talker where the talker is addressing someone in the background.

  • <ov/> text </ov> Overlapping speech is when a speaker is interrupted by another speaker, at a roughly equal volume. In situations where overlapping speech occurs, insert the breakpoint at the beginning of the word in which the interruption started, in other words, at the end of the last complete word.