|The recordings on this nine-disc set were originally made in 1978-79 as part of a British Home Office study into speaker identification techniques. Subsequently, it was realized that a large body of unconstrained conversational material might be of interest to researchers working in other speech processing fields. The recordings were transcribed and the CD-ROMs prepared during 1993. |
The recordings were made at the Police Staff College, Bramshill, Hampshire, England. The participants were police officers taking part in the various courses at the college. This provided a wide range of regional accents and a range of ages from late teens to early fifties. Each speaker is described by nine demographic attributes.
Three adjacent bedrooms were used. The two participants, each alone in their rooms, conversed by telephone. The third room was used as a monitoring and recording station.
In addition to the telephone recordings, reference recordings were made using a high quality dynamic microphone in each room. It is these higher quality recordings, not the telephone speech, which are provided on the BRAMSHILL CD-ROM set.
The recordings were made on a Sony Elcaset EL-7 cassette machine, chosen at the time because of its good speed stability. The microphone was a Shure SM-7 cardioid type. The speech data was sampled at 10 kHz, 16-bit resolution.
Some attempt was made to control the acoustic environment. It is evident from listening to the recordings that, while these measures produced a reasonable recording environment, the rooms were far from soundproof. A variety of external noises (engines, aircraft, etc) can be heard on some of the recordings.
Each speaker was given a pile of photographs. In response to a bleep signal, each speaker introduced himself by name and read a set of test sentences. After this, the main part of the conversation took place, in which participants were asked to determine which of each pair of photographs has been taken first (if indeed they were related at all). The conversations continued for 10 minutes until terminated by another bleep signal.
During the digitization process, some periods of silence were removed, so some recordings now appear to be shorter than the original ten minutes. Furthermore, this means that recordings of two sides of a conversation are no longer time-aligned. In addition, to preserve the anonymity of the speakers, some passages (mainly the introductions) have been erased by replacing with binary zeroes. Finally the bleep signals have also been erased with binary zeroes. The transcriptions indicate where this has occurred.
The speech was transcribed verbatim. No attempt was made to correct grammar, fill in missing words etc. Transcription conventions are detailed in the documentation. Every lexical word from the transcriptions is contained in the dictionary supplied in the INDEX directory. There are about 6,500 word types in the 600k words of the transcripts. Contractions, part-words, slang words, hesitation sounds and the non-speech sounds such are all treated as words in their own right in the dictionary.
Content Copyright Portions © 1994 Trustees of the University of Pennsylvania