2.2.4 Spectrogram and Formant

Speech consists of vibrations produced in the vocal tract. The vibrations themselves can be represented by speech waveforms. It is not possible to read the phonemes in a waveform, but if we breakdown the waveform into its frequency components, we obtain a spectrogram which can be deciphered. The quality of a sound such as a vowel depends upon its containing pitches, so-called formants, which are the result of the different shapes of the vocal tract. These formants are shown as dark horizontal bars on the spectrogram.

A spectrogram such as the one at the bottom of Figure 2.3 is created by displaying all of the Linear Predictive Coding (LPC) parameters computed from the speech waveform. The vertical axis in a spectrogram represents frequency, with 0 tex2html_wrap_inline3600 8 kHz (from the bottom to the top). All of the spectra computed by the Fourier transform are displayed parallel to this vertical or y-axis. The horizontal axis represents time; as we move right along the x-axis we shift forward in time, traversing one spectrum after another. For reference, we performed a spectral analysis on 25-msec sections of waveform using a broad analysis filter with intervals of 2 msec.

Next: Spectrogram Reading Up: 2.2 Speech Sounds and Previous: 2.2.3 Loudness and Intensity

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999