Acoustic Analysis with Robustness under Noisy Environment

Fumitada Itakura, Shoji Kajita

School of Engineering, Nagoya University

ita@itakura.nuee.nagoya-u.ac.jp

In conventional research for speech recognition, the influences of room noise and room reverberation have been almost neglected. In practical interactive speech processing, however, they significantly influence its recognition performance. Although, in real human speech recognition, the influences could be compensated by using language restrictions and/or semantic restrictions, they are appropriately processed in peripheral auditory system. Our aim is to develop a practical front-end system for speech recognition applying the human auditory processing. We have proposed the "Subband-Autocorrelation Analysis (SBCOR Analysis)", and applied to speech recognition. Although the SBCOR analysis is a simplified version of the auditory model proposed by Seneff, it is not a precise auditory model but a signal processing model based on filter bank and autocorrelation analysis. The experimental results for speech recognition show that the SBCOR spectrum performs equally as well as the smoothed group delay spectrum under clean conditions, and much better under heavily noisy conditions. In this paper, an extension of the SBCOR analysis is proposed, and shown that it is more robust against noise than the conventional one. The extended SBCOR spectrum is calculated from the weighted sum of autocorrelation coefficients at multiples of 1/center-frequency. This is based on the ringing phenomenon in firing patterns of an auditory nerve fiber. This extension is referred as Multi-Delay Weighting (MDW). The evaluation is performed by a DTW word recognition system to discriminate 68 Japanese city-name pairs that are phonetically very similar. The results indicate that the SBCOR spectrum with MDW is more robust against noise than the conventional SBCOR spectrum.