Our idea is that the best current basis for speech recognition is to start with a simple and internally-consistent surface phonemic (allophonic) representation of citation forms in standard American dialect(s). Predictable variation due to dialect, reduction, or transcription uncertainty will be added in a second stage. In each such case, we have tried to define a standard transcription that will be suitable to support generation of the set of variant forms.
An illustrative example: some American dialect distinguish the vowels in "sawed" and "sod", while others do not; the ending "-ing" can be pronounced with a vowel more like "heed" or one more like "hid", and with a final consonant like that of "sing" or like that of "sin". This does not take account of considerable variation of actual quality in these sounds: thus some (New Yorkers) pronounce the vowel of "sawed" as a sequence of a vowel like that in "Sue" followed by one like that in "Bud", while in less stigmatized dialects it is a single vowel (that may or may not be like that in "sod").
Combining all these variants for the transcription of the word "dogging" we would get 12 pronunciations -- three versions of the first vowel, two versions of the second vowel, and two versions of the final consonant. Then someone else comes along to tell us that some Chicagoans not only merge the vowels in "sawed" and "sod" but also move both of them towards the front of mouth, with a sound similar (in extreme cases) to the more standard pronunciation of "sad". Now we have 4 X 2 X 2 = 16 pronunciations for the simple word "dogging" -- with a comparable 16 available for "logging" and "hogging" and so forth, and plenty of variants yet to catalogue.
Our approach is to give just one pronunciation in such a case. Some speech recognition researchers will want to use our lexicon to generate a network of predictable alternative transcriptions, taking account of dialect variation and reduction phenomena. Others may prefer to let statistical modeling of acoustic correlates handle some or all of such variation.
We want to present a consistent transcription for each lexical set -- so that in our example, "dogging" is not transcribed in one of the 16 ways while a second, different choice is made for "logging," and a third one for "hogging." We also want to choose a transcription that will support generation of all variants, so that distinctions made in some dialects should be made in our transcription if possible. Finally, we do want the transcription to indicate those variants that are lexically specific. Thus many cases of the prefix "re-" have both reduced and full variants (e.g. "reduction"), but many others do not (e.g. "recapitalization"). The difference apparently depends on how separable the prefix is from the rest of the word, but our lexicon simply has to list explicitly the cases that permit reduction.
In order to produce a consistent transcription, especially in a lexicon produced by several different people, we have had to develop a set of explicit principles for the many cases that are left unclear by a simple specification of an allophone set. This development is still underway. What follows is the current draft, at the end of a brief but intensive effort to produce a Release 0 WSJ30 vocabulary. The principles are still under development, and comments are welcome.
Here is the symbol set we are using. The "LONG" form is a modified arpabet designed by Bill Fisher at NIST. The "SHORT" form is a single-character-per-allophone version that we developed to reduce wear and tear on our transcribers' fingers.
LONG SHORT EXAMPLES COMMENT ____________________________________________________________________ iy i heed, heat, he ux u ? sometimes used by TI for /u/ -- ignore ih I hid, hit ey e aid, hate, hay eh E head, bet ae @ had, hat aa a hod, hot aax a ? probably Brit: father, alms (vs. pot, botch) ao c law, awe ow o hoed, oats, owe uh U could, hood uw u who'd, hoot, who ay Y hide, height, high oy O Boyd, boy aw W how'd, out, how er R father(2); herd, hurt, her ax x data (2); ah A cud, bud ix X credit(2)? not used by us wh H which w w witch y y yes r r Ralph l l lawn m m me em M ? syllabic m n n no en N button(2) nx G hang p p pot b b bed t t tone d d done dx ? Peter(2) flap -- not used by us k k kid g g gaff q q ? Glottal stop -- not used by us ch C check jh J judge f f fix v v vex th T thin dh D this s s six z z zoo sh S shin zh Z pleasure(2) hh h help '1 ' '2 + '3 + '0 .A note on stress and syllabification:
We distinguish main stress, non-main-stress, and lack-of-stress. For the convenience of the transcribers in entering and checking material, the stress marks may be put between the syllables. However, we have not tried to enforce a consistent set of principles for syllabification, and so the lexicon will be delivered with the stress marks preceding their vowels. Software is available from Bill Fisher that will syllabify arbitrary entries about as well as human annotators can do it, and more consistently.
(1) Certain classes of words may contain exceptions to the rest of these principles. In the current release, we've tagged most instances; comprehensive tagging will be provided in the next release.
(a) function words, e.g.: the T'i #FUNC am '@m #FUNC anyhow 'En.ih+W #FUNC but b'At #FUNC (b) names, e.g.: ditka d'Itk.x #NAME cadbury k'@db.xr.i #NAME equicor 'Ekw.Ik+or #NAME tiananmen t+i'an.xm'En ty'En.xm.En #NAME (c) foreign words, e.g.: calabasas k+@l.xb'@s.xs #FOR valenzano v+@l.Enz'an.o #FOR sumitomo s+um.it'om.o #FOR (d) abbreviations, e.g.: calif. k+@l.If'orny.x #ABBREV corp. k+orp.xr'eS.In #ABBREV oct. .akt'ob.R #ABBREV (e) acronyms, e.g.: cmos s'im+cs s'im+os #ACRO afscme '@fskm+i #ACRO sids s'Idz #ACRO (f) words with unclear status, possible typos: allegis .xl'EJ.Iz #? attact .xt'@kt #? wal w'cl #?(2) DIALECTAL DIFFERENCES: Distinctions made by some dialects but not others are transcribed if possible; alternate pronunciations that reflect mergers can be derived by rule. Some examples:
pen p'En pin p'In accent '@ks+Ent .@ks'Ent adventure .@dv'EnC.R expend .Eksp'End
mary m'er.i #NAME marry m'@r.i merry m'Er.i fare f'er garrett g'@r.It #NAME guarantee g+@r.xnt'i
smaller sm'cl.R smog sm'cg abroad .xbr'cd bylaws b'Yl+cz sausalito s+cs.xl'it.o #NAME
quarrel kw'or.xl moral m'or.xl maureen m.or'in #NAME workhorse w'Rkh+orsH/w - Pronunciations that merge 'H' and 'w' can be derived by replacing any 'H' with 'w':
buckwheat b'AkH+it meanwhile m'inH+Yl wharton H'ort.N #NAME where H'er #FUNC(3) SCHWA and REDUCED VOWELS:
sausage s'cs.IJ wanted w'cnt.Id amazes .xm'ez.Iz argentines 'arJ.Int+inz 'arJ.Int+Ynz #NAME brokerages br'ok.xr.IJ.Iz br'okr.IJ.Iz'l' is not included in the above environment, as it tends to have a lowering effect on I (to x).
bechtel b'Ekt.xl
antacid +@nt'@s.Id aeronautical +er.xn'ct.Ik.xl yetnikoff y'Etn.Ik+cf #NAME anthropologist +@nTr.xp'al.xJ.ist arithmetic .xr'ITm.xt.Ik .@r.ITm'Et.Ik
(4) SYLLABIC CONSONANTS and engma (ng):
formerly f'orm.Rl.i overarching 'ov.R+arC.IG undergo +And.Rg'o thunderbird T'And.Rb+Rd glamorous gl'@m.xr.xs literally l'It.xr.xl.i winery w'Yn.xr.i
ardent 'ard.Nt written r'It.N satin s'@t.N
bartlesville b'art.Lzv+Il gittleman g'It.Lm.xn littlebrook l'It.Lbr+Uk
throwing Tr'o.IG tongs t'cGz torrington t'or.IGt.xn #NAME bangkok b+@Gk'ak #NAME banks b'@Gks bilingual b+Yl'IGgw.xl conclusion k.xnkl'uZ.xn dyncorp 'dYn+korp #NAME encompassing .En'kAm.px.sIG
The treatment of stress and reduction is problematic. We've eliminated tertiary stress in order to reduce the uncertainty; still, the question of where to mark secondary stress remains unclear. Our principles are intended to reflect one traditional mode of description -- alternative proposals are welcome.
As a general rule of thumb, we've only notated secondary stress on a syllable adjacent to the syllable with primary stress when it's clear that reduction isn't possible. Many of the syllables/morphemes that wouldn't normally get stress may carry secondary stress when they fall in an alternating pattern with the primary stressed syllable; and syllables/morphemes that normally carry secondary stress may be unstressed and reduced when adjacent to another stressed syllable, e.g.:
disabled d.Is'eb.xld dismissing d.Ism'Is.IG disaffected d+Is.xf'Ekt.Id disappearance d+Is.xp+ir.InsTense full vowels (i, e, u, o, Y, W) get some degree of stress, except that the following principles take precedence:
radio r'ed.i+o ambiguous .@mb'Igy.u.xs delineated d.xl'In.i+et.Id media m'id.i.x factual f'@kC.u.xl arsenio .ars'En.i+o #NAMENote that compound words may contain exceptions; a final V in the first part of a compound may be tense but unstressed, e.g.,
petrochemical p+Etr.ok'Em.Ik.xl antitrust +@nt.itr'Ast +@nt.Ytr'Ast
shortly S'ortl.i happily h'@p.xl.i lasky l'@sk.i #NAME Comanche k.xm'@nC.i #NAME reentry r.i'Entr.i Cherokee C'Er.xk+i #NAME dutifully d'ut.Ifl+i uzi 'uz+i
yellow y'El.o amarillo +@m.xr'Il.o #NAME echoes 'Ek+oz hero h'ir+o anglo 'eGgl+o amoco '@m.xk+o #NAME
konimoru k+on.im'or.u #FOR NAME mitsuzuka m+Its.uz'uk.x #FOR NAME fiero f+i'Er+o #FOR NAME peroni p+er'on+i #FOR NAME
accept .@ks'Ept ambassadors .@mb'@s.Id.Rz alberto .al'bR+to .@lb'Rt+o #NAME admiring .@dm'Yr.IG adheres .@dh'irz abdulla .abd'Al.x #FOR NAME arsenio .ars'En.i+o #NAME absolutely +@bs.xl'utl.i
We haven't transcribed flapping,
better b'Et.R shutter S'At.R shudder S'Ad.Rand we're not transcribing intrustive /t/,
tents t'Ents tense t'Ens
Regards,
Cynthia McLemore