This file describes the organisation and format of the BRAMSHILL speech collection.
The BRAMSHILL collection is a set of CD-ROMs containing digitised and transcribed recordings of free conversation. Each item is a recording of one half of a two-speaker conversation. Most of the recordings include a standard set of test sentences as well as the conversation.
The DOC directory contains this and other text files describing the collection.
The INDEX directory contains catalogue information.
The SPEAKERS directory contains the data. Within SPEAKERS there is a separate sub-directory for each speaker. All the material for any given speaker is contained on a single CD-ROM.
| | ==================================== | | | | | | DOC INDEX SPEAKERS | | ============================= | | | | Snnn .... .... .... Smmm | | ================================================= | | | | | | | | | | | | Snnn1.DAT Snnn1.TMT Snnn2.DAT Snnn2.TMT .... ....
Each speaker in the collection has been allocated a four character identifier, consisting of the letter S and 3 digits, for example S324.
Each recording has been allocated a five character identifier consisting of the speaker's identifier plus an additional digit, for example S3242.
Speech data files have the extension .DAT and the transcription files have the extension .TMT.
An item is composed of one data file and one transcription file. For example an item might consist of the pair of files S3241.DAT and S3241.TMT.
For transcription purposes, the recordings were divided into "utterances". These utterances are short sections of speech, typically a phrase or short sentence. The division into utterances was carried out by an automated process which attempted to place utterance boundaries in the pauses between words. Occasionally, however, in cases such as long runs of unbroken speech, the utterance boundaries might occur in inappropriate locations. This should be remembered when using the transcriptions.
The transcription files contain the start time, length and corresponding text of each utterance. Time and length are given in 0.1 second units.
A number of conventions were adopted in the transcriptions:
A definitive list of the non-verbal sounds such as "[cough]" can be obtained by searching the dictionary for words starting with "[" and ending with "]".
4.1 Speech Data Files
The speech data was sampled at 10 kHz and stored as 16-bit 2's complement integers, least significant byte first. Each recording is stored in a single file (extension .DAT). The first 1024 bytes of the file contain an ASCII header. The remainder of the file is the speech data. The data section of a speech file is an unstructured byte stream. There is no record or block structure.
The first 1024 bytes contain a header in the NIST SPHERE format. This is an ASCII text based format. An example header follows---
NIST_1A 1024 database_id -s9 BRAMSHILL database_version -s3 1.0 utterance_id -s5 S1231 channel_count -i 1 sample_count -i 6000000 sample_rate -i 10000 sample_min -i -28127 sample_max -i 25763 sample_n_bytes -i 2 sample_byte_format -s2 01 sample_sig_bits -i 16 transcriber -s3 DWK end_headThe first two lines are the standard header introduction. These are each eight characters long (including the "newline" terminator). The last line is the standard header terminator. The remainder of the header is padded to 1024 bytes with "newline" characters. This means that the header can be inspected easily by using a utility program such as "more".
The body of the header consists of a set of "triples" each having the general form "name type value". In the BRAMSHILL database, only two type specifiers are used---
-i - Integer -sn - String, length n charactersEach triple is terminated by a single "newline" character
The header fields are---
database_id Database name, always "BRAMSHILL" database_version Database Version, for example "1.0" utterance_id Item identifier, for example "S1231" channel_count Always 1 sample_count Number of samples in the data file sample_rate Sample rate, always 10000 sample_min Minimum sample value sample_max Maximum sample value sample_n_bytes Bytes per sample, always 2 sample_byte_format Sample byte order, always "01", LSB first sample_sig_bits Always 16 transcriber Code identifying the transcriber4.2 Transcription Files
The transcription files (extension .TMT) are ASCII text files. The first line of each transcription file is a header identifying the transcribed item. The header line has the format---
Transcription of BRAMSHILL item S1232Following the header line is a series of utterance transcriptions.
Utterance transcriptions consist of two integers and the transcription text separated by single spaces. Each utterance transcription is terminated by a new line.
The two integers define the start time (relative to the start of the file) and duration of the utterance in 0.1 second units.
An example utterance transcription follows---
3512 35 There is a clock in the right hand side of the picture.
Each utterance transcription is contained on a single line. The newline character terminates an utterance transcription. This approach is used because it minimises the parsing required for machine processing of transcription files. However, this means that a few utterances will be longer than a typical terminal line. The file TRFMT.C in the documentation directory contains the source code of an example program in C which will re-format transcriptions with a defined right hand margin.
The INDEX directory contains 4 files---
5.1 Speaker List
The speaker list, SPEAKERS.IDX, is an ASCII text file containing such information as is known about each speaker. During the original data collection, information was not recorded consistently, and in many cases it was not recorded at all. So with the exception of the speaker ID, all fields are regarded as optional.
Each speaker is described by 9 lines of text. Each line contains a single attribute of the speaker. The file as a whole contains a multiple of 9 lines. With the exception of the first line in each case, the remaining lines may be empty if the information is not available.
The content of each line is listed below---
1 Speaker Id, e.g. S321 2 Sex, M or F 3 Age (integer years) 4 Height (integer cm) 5 Weight (integer kg) 6 Other observations. Sometimes collar size is given (in cm). 7 Birth and domicile information. Not always available and not consistently recorded when it is. However, it is included because in some cases it may help with accent etc. 8 Comments on appearance. Sometimes information about speaker's build is given. 9 Comments on accent and voice. Neither consistent nor universal, but possibly useful.Where birth and domicile information is available (field 7) it is recorded as items in the form
London:0-7 Wales:6 Scotland
indicates a subject who lived in London from ages 0 to 7, lived for 6 years (at unknown ages) in Wales and spent some unspecified time in Scotland.
Accent information (field 9) was not recorded consistently. It
is presented in the form
Lancashire/Wigan
5.2 Item List
The item list, ITEMS.IDX, is an ASCII text file containing
such information as is known about each item. Each item is
described by 5 lines of text. Each line contains a single
attribute of the item. The file as a whole contains a
multiple of 5 lines. Lines 4 and 5 may be empty if the
information is not available.
The content of each line is listed below---
The pairs file, PAIRS.IDX, is an ASCII text file.
Each line in the file contains two item identifiers separated
by a single space. The two items represent the two sides of
the same conversation.
Note that, during digitisation of the recordings, long
silences were automatically removed. One consequence of this
is that the two items comprising a conversation are not
necessarily time aligned.
There are some items for which the pairing could not be
identified. Such items do not appear in PAIRS.IDX.
5.4 Dictionary
The transcription dictionary is that used for spelling
checking during item transcription. All the item
transcriptions can be "spell checked" against this
dictionary without error.
The dictionary includes the actual words from the
transcriptions, including slang words, part words and the
list of non-speech sounds. It also includes words from the
transcription comments, so it is possible that there are
words in the dictionary for which there is no spoken example
in any item.
The dictionary is a sorted ASCII text file with one word per
line.
Yorkshire/Slight
unusual
1 Item Id, e.g. S3212
2 Speaker Id, e.g. S321
3 Number of the disk containing the item, integer
4 Picture code A, B, C or R
5 Comment (if any)
The picture code describes the topic of conversation.
Participants were given a set of photographs to discuss. The
sets were---
Set A - 8 pairs of photographs. 4 of a fairground scene,
1 market scene, 1 high street, 1 at a swimming pool and
1 railway station.
Set B - 7 pairs of photographs. 2 fairground, 3 market,
2 air show and 1 high street.
Set C - 8 pairs of photographs. 3 fairground, 2 market,
2 air show, 1 high street.
Set R - 9 pairs of photographs. 3 fairground, 2 market,
3 air show, 1 high street.
5.3 Pairs List