Spectrogram Reading

Speech researchers have relied heavily upon spectrum analysis techniques since the late 1930s, with the invention of the sound spectrograph; a device that translates a sound into a visual representation of its component frequencies. It is because each phoneme is assumed to be distinguishable by its own unique pattern in the spectrogram. For voiced phonemes, the signature involves large concentrations of energy called formants, and there is a characteristic waxing and waning of energy in all frequencies, which is the most salient characteristic of what we call the human voice. Below are the spectrographic characteristics of six categories for the manner of articulation as listed in Table 2.1.

Plosive - involves an explosive burst of acoustic energy following a short period of silence, because the vocal tract is completely blocked just before the sound is produced.
Nasal - has much less energy than any of the other voiced sounds because the oral tract is completely blocked, and sound waves radiate principally from the nose.
Fricative - is in high-frequency regions which are more random in energy distribution than voicing, although the voiced fricatives may have a very low voice.
Affricate - shows as one or more thin bars to the left of the large rectangle of frication.
Tap or Flap - shows formants in low-frequency regions because the vocal tract is blocked at the roof of the mouth.
Approximant - has formants which are less pronounced than those of vowels, because of a slight obstruction placed somewhere along the vocal tract.

Through a spectrogram, it is possible to see some differences that were not seen in the waveforms. Furthermore, there are also a number of features observable on spectrograms that indicate a speaker's individual speech habits and are not language dependent.

Next: Formants and Vowels Up: 2.2.4 Spectrogram and Formant Previous: 2.2.4 Spectrogram and Formant

Jo Chul-Ho
Wed Oct 13 17:59:27 JST 1999