File: ffmtimit.doc, updated 6/2/95

			       FFMTIMIT

	      Acoustic-Phonetic Continuous Speech Corpus
		   Far Field Microphone Recordings


                     Training and Test Data
                     NIST Speech Disc 21-1.1


The TIMIT corpus of read speech has been designed to provide speech
data for the acquisition of acoustic-phonetic knowledge and for the
development and evaluation of automatic speech recognition systems.
The FFMTIMIT corpus contains the previously unreleased secondary
microphone recordings of the TIMIT corpus.  The speech recordings for
TIMIT resulted from the joint efforts of several sites under
sponsorship from the Advanced Research Projects Agency (ARPA).  Text
corpus design was a joint effort among the Massachusetts Institute of
Technology (MIT), Stanford Research Institute (SRI), and Texas
Instruments (TI).  The speech was recorded at TI, transcribed at MIT,
and has been maintained, verified, and prepared for CD-ROM production
by the National Institute of Standards and Technology (NIST).  This
file contains a brief description of the FFMTIMIT Speech Corpus.
Additional information including the referenced material and some
relevant reprints of articles may be found in the TIMIT companion
booklet.


1. Corpus Speaker Distribution
-- ---------------------------

FFMTIMIT contains Breul and Kjaer microphone recordings for 613 of the
630 TIMIT corpus speakers (the B&K data for the remaining 17 speakers
was unrecoverable).  FFMTIMIT contains a total of 6130 sentences, 10
sentences spoken by each of 613 speakers from 8 major dialect regions
of the United States.  Table 1 shows the number of speakers for the 8
dialect regions, broken down by sex.  The percentages are given in
parentheses.  A speaker's dialect region is the geographical area of
the U.S.  where they lived during their childhood years.  The
geographical areas correspond with recognized dialect regions in U.S.
(Language Files, Ohio State University Linguistics Dept., 1982), with
the exception of the Western region (dr7) in which dialect boundaries
are not known with any confidence and dialect region 8 where the
speakers moved around a lot during their childhood.


   Table 1:  Dialect distribution of speakers

      Dialect
      Region(dr)    #Male    #Female    Total
      ----------  --------- ---------  ----------
         1         29 (63%)  17 (37%)   46 (8%)  
         2         70 (71%)  29 (29%)   99 (16%) 
         3         75 (77%)  23 (23%)   98 (16%) 
         4         69 (69%)  31 (31%)  100 (16%) 
         5         57 (62%)  35 (38%)   92 (15%) 
         6         30 (65%)  16 (35%)   46 (8%) 
         7         74 (75%)  25 (25%)   99 (16%) 
         8         22 (67%)  11 (33%)   33 (5%)
       ------     --------- ---------  ---------- 
         8        426 (69%) 187 (31%)  613 (100%)

The dialect regions are:
     dr1:  New England
     dr2:  Northern
     dr3:  North Midland
     dr4:  South Midland
     dr5:  Southern
     dr6:  New York City
     dr7:  Western
     dr8:  Army Brat (moved around)


2. Corpus Text Material 
-- --------------------

The text material in the TIMIT prompts (found in the file
"prompts.doc") consists of 2 dialect "shibboleth" sentences designed
at SRI, 450 phonetically-compact sentences designed at MIT, and 1890
phonetically-diverse sentences selected at TI.  The dialect sentences
(the SA sentences) were meant to expose the dialectal variants of the
speakers and were read by all 613 speakers.  The phonetically-compact
sentences were designed to provide a good coverage of pairs of phones,
with extra occurrences of phonetic contexts thought to be either
difficult or of particular interest.  Each speaker read 5 of these
sentences (the SX sentences) and each text was spoken by 7 different
speakers (SEE NOTE #1).  The phonetically-diverse sentences (the SI
sentences) were selected from existing text sources - the Brown Corpus
(Kuchera and Francis, 1967) and the Playwrights Dialog (Hultzen, et
al., 1964) - so as to add diversity in sentence types and phonetic
contexts.  The selection criteria maximized the variety of allophonic
contexts found in the texts.  Each speaker read 3 of these sentences,
with each sentence being read only by a single speaker.  Table 2
summarizes the speech material in FFMTIMIT.


    Table 2:  FFMTIMIT speech material

  Sentence Type   #Sentences   #Speakers   Total   #Sentences/Speaker
  -------------   ----------   ---------   -----   ------------------
  Dialect (SA)          2         613       1226           2

  Compact (SX)        366           7       2562           5
                       83           6        498           .
                        1           5          5           .

  Diverse (SI)       1839           1       1839           3
  -------------   ----------   ---------   -----    ----------------
  Total              2291                   6130          10


3. Suggested Training/Test Subdivision
-- -----------------------------------

The speech material has been subdivided into portions for training and
testing.  The criteria for the subdivision is described in the file
"testset.doc".  THIS SUBDIVISION HAS NO RELATION TO THE DATA
DISTRIBUTED ON THE PROTOTYPE VERSION OF THE CD-ROM, AND THE SUGGESTED
TRAINING AND TEST SETS IDENTIFIED ON THIS RELEASE DIFFER SOMEWHAT FROM
THOSE ON THE COMPLETE TIMIT DISC, BECAUSE SOME B&K DATA COULD NOT BE
RECOVERED.


Core Test Set:

The test data has a core portion containing 24 speakers, 2 male and 1
female from each dialect region.  The core test speakers are shown in
Table 3.  Each speaker read a different set of SX sentences.  Thus the
core test material contains 192 sentences, 5 SX and 3 SI for each
speaker, each having a distinct text prompt.


    Table 3:  The core test set of 24 speakers

     Dialect        Male      Female
     -------       ------     ------
        1        DAB0, WBT0    ELC0    
        2        TAS1, WEW0    PAS0    
        3        JMP0, LNT0    PKT0    
        4        LLL0, TLS0    JLM0    
        5        BPM0, KLT0    NLP0    
        6        CMJ0, JDH0    MGD0    
        7        GRT0, NJM0    DHC0
        8        JLN0, PAM0    MLD0    


Complete Test Set:

A more extensive test set was obtained by including the sentences from
all speakers that read any of the SX texts included in the core test
set.  In doing so, no sentence text appears in both the training and
test sets.  This complete test set contains a total of 162 speakers
and 1296 utterances, accounting for about 21% of the total speech
material.  The resulting dialect distribution of the 162 speaker test
set is given in Table 4.  The complete test material contains 606
distinct texts.


     Table 4:  Dialect distribution for complete test set

      Dialect    #Male   #Female   Total
      -------    -----   -------   -----
        1           6        3        9
        2          18        7       25
        3          22        3       25
        4          16       16       32
        5          15       11       26
        6           8        3       11
        7          15        8       23
        8           8        3       11
      -----      -----   -------   ------
      Total       108       54      162


4. CDROM FFMTIMIT Directory and File Structure
-- -------------------------------------------

The speech and associated data is organized on the CD-ROM according to
the following hierarchy:

/<CORPUS>/<USAGE>/<DIALECT>/<SEX><SPEAKER_ID>/<SENTENCE_ID>.<FILE_TYPE>

     where,

     CORPUS      :== ffmtimit
     USAGE       :== train | test
     DIALECT     :== dr1 | dr2 | dr3 | dr4 | dr5 | dr6 | dr7 | dr8 
                    (see Table 1 for dialect code description)
     SEX         :== m | f
     SPEAKER_ID  :== <INITIALS><DIGIT>
          
          where, 
          INITIALS  :== speaker initials, 3 letters
          DIGIT     :== number 0-9 to differentiate speakers with
                       identical initials
                              
     SENTENCE_ID :== <TEXT_TYPE><SENTENCE_NUMBER>
          
          where,
              
          TEXT_TYPE :== sa | si | sx
                        (see Section 2 for sentence text type description)
          SENTENCE_NUMBER :== 1 ... 2342
                    
     FILE_TYPE   :== wav | txt | wrd | phn
                    (see Table 5 for file type description)

Examples:
     /ffmtimit/train/dr1/fcjf0/sa1.wav
                         
     (FFMTIMIT corpus, training set, dialect region 1, female speaker, 
      speaker-ID "cjf0", sentence text "sa1", speech waveform file)
      

      /ffmtimit/test/dr5/mbpm0/sx407.phn
      
      (FFMTIMIT corpus, test set, dialect region 5, male speaker,
       speaker-ID "bpm0", sentence text "sx407", phonetic
       transcription file)
      
                                                      
Online documentation and tables are located in the directory
"ffmtimit/doc".  A brief description of each file in this directory
can be found in Section 6.


5. File Types
-- ----------

The FFMTIMIT corpus includes several files associated with each
utterance.  In addition to a speech waveform file (.wav), three
associated transcription files (.txt, .wrd, .phn) exist.  These
associated files have the form:

    <BEGIN_SAMPLE> <END_SAMPLE> <TEXT><new-line>
        .
        .
        .
    <BEGIN_SAMPLE> <END_SAMPLE> <TEXT><new-line>

        where,        
        
        BEGIN_SAMPLE :== The beginning integer sample number for the 
                         segment (Note: The first BEGIN_SAMPLE of each 
                         file is always 0)
                                 
        END_SAMPLE :== The ending integer sample number for the segment
                       (Note: Because of the transcription method used,
                       the last END_SAMPLE in each transcription file 
                       may be less than the actual last sample in the
                       corresponding .wav file)

        TEXT :== <ORTHOGRAPHY> | <WORD_LABEL> | <PHONETIC_LABEL>
                         
             where,
                
             ORTHOGRAPHY :== Complete orthographic text transcription
             WORD_LABEL :== Single word from the orthography
             PHONETIC_LABEL :== Single phonetic transcription code
                               (See "phoncode.doc" for description 
                                of codes)


    Table 5:  Utterance-associated file types          

 File Type                     Description
 ---------  ------------------------------------------------------
     
     .wav - SPHERE-headered speech waveform file.  (See the "/sphere"
            directory for speech file manipulation utilities.)

     .txt - Associated orthographic transcription of the words the
            person said.  (Usually this is the same as the prompt, but 
            in a few cases the orthography and prompt disagree.)

     .wrd - Time-aligned word transcription. The word boundaries
            were aligned with the phonetic segments using a dynamic
            string alignment program (see the printed documentation
            section "Notes on the Word Alignments" and the lexical
            pronunciations given in "timitdic.txt".)  Note also that
	    the time-alignments differ from those in the TIMIT corpus
	    to account for a propagation delay of 20 samples,
	    corresponding to the placement of the B&K microphone at
	    approximately 16" from the Sennheiser microphone.

     .phn - Time-aligned phonetic transcription.  (See the reprint
            of the article by Seneff and Zue (1988), in the printed
            documentation, and the section "Notes on Checking the
            Phonetic Transcriptions" for more details on the phonetic
            transcription protocols.)  Note also that the
	    time-alignments differ from those in the TIMIT corpus to
 	    account for a propagation delay of 20 samples,
	    corresponding to the placement of the B&K microphone at
	    approximately 16" from the Sennheiser microphone.
             
                                        
Example transcriptions from the utterance in "/ffmtimit/test/dr5/fnlp0/sa1.wav"

Orthography (.txt):
        0 61748 She had your dark suit in greasy wash water all year.

Word label (.wrd):
7490 11382 she
11382 16020 had
15440 17523 your
17523 23380 dark
23380 28380 suit
28380 30980 in
30980 36991 greasy
36991 42310 wash
43140 47500 water
49041 52204 all
52204 58860 year


Phonetic label (.phn): 
(Note: beginning and ending silence regions are marked with h#)


  0 7490 h#
7490 9860 sh
9860 11382 iy
11382 12928 hv
12928 14780 ae
14780 15440 dcl
15440 16020 jh
16020 17523 axr
17523 18560 dcl
18560 18970 d
18970 21073 aa
21073 22220 r
22220 22760 kcl
22760 23380 k
23380 25335 s
25335 27663 ux
27663 28380 tcl
28380 29292 q
29292 29952 ih
29952 30980 n
30980 31890 gcl
31890 32570 g
32570 33273 r
33273 34680 iy
34680 35910 z
35910 36991 iy
36991 38411 w
38411 40710 ao
40710 42310 sh
42310 43140 epi
43140 43926 w
43926 45500 ao
45500 46060 dx
46060 47500 axr
47500 49041 q
49041 51368 ao
51368 52204 l
52204 54167 y
54167 56674 ih
56674 58860 axr
58860 61700 h#


6. Online Documentation
-- --------------------

Compact documentation is located in the "/ffmtimit/doc" directory.
Files in this directory with a ".doc" extension contain freeform
descriptive text and files with a ".txt" extension contain tables of
formatted text which can be searched programmatically.  Lines in the
".txt" files beginning with a semicolon are comments and should be
ignored on searches.  The following is a brief description of their
contents:

    phoncode.doc - Table of phone symbols used in phonemic dictionary
	           and phonetic transcriptions
     prompts.txt - Table of sentence prompts and sentence-ID numbers
    spkrinfo.txt - Table of speaker attributes
    spkrsent.txt - Table of sentence-ID numbers for each speaker
     testset.doc - Description of suggested train/test subdivision
    timitdic.doc - Description of phonemic lexicon
    timitdic.txt - Phonemic dictionary of all orthographic words in
		   prompts


A more extensive description of corpus design, collection, and
transcription can be found in the printed documentation.


NOTES
=====

#1) Because only 613 of the original 630 speakers were recovered, not
all of the phonetically-compact sentences were spoken by exactly seven
speakers.  Listed below are the sentences (sx) in exception.


  Sentences Spoken 5 times:
	277

  Sentences Spoken 6 times:
	  3	  4	  5	  6	  7	  8	  9	 10
	 11	 21	 56	 57	 58	 59	 60	 90
	 91	 92	 93	 94	 95	 96	 97	 98
	 99	100	101	146	147	148	149	150
	180	181	182	183	184	185	186	187
	188	189	190	191	236	237	238	239
	240	270	271	272	273	274	275	276
	278	279	280	281	326	328	329	330
	360	361	362	363	364	365	367	368
	369	370	371	416	417	418	419	420
	450	451	452