English / Japanese

Demonstrations of my research. (Under construction)

Links


Introduction

Our goal is the development of a robot that can extract a user's speech from a mixture of sounds and can interact with humans naturally through speech in various environments. To achieve such a robot audition system, we must cope with the following three problems at the same time,
   1. multi-source (speech and other noise) signals,
   2. the robot's own speech signal, and
   3. the reverberations of them.
Here, Independent Component Analysis (ICA) can solve the three problems with only one framework. Therefore, I try to design ICA with low computational cost and automatic speech recognition (ASR) oriented scheme.

ICA (Independent Component Analysis)

ICA is a signal separation method based on the statistical independence among sources.

For further information, wikipedia is useful to understand the overview of ICA.
Wikipedia:ICA

=== Blind source separation (BSS) ===

Separation of the sound sources only by using the observed signals captured by microphones.

/*-- Example of BSS with ICA under reverberant environment --*/

* We do not cope with the initial reflections which are easy to be solved by ASR techniques.

Conditions :
    1. Simulated data with real recorded impulse responses. (RT20 0.9[s])
    2. Without other noises
    3. Batch processing
    4. 30 iterations in estimating separation filter (with step-size adaptation)
    5. Permutation is solved by using reference (To evaluate the upper-limit of the method)

Sound source: Male speaker A + Male speaker B (two speakers case)

* Example of the simulated data
-- Observed signal of 1st ch; wav


*[Baseline method] Results with 4 microphones, frequnecy-domain ICA
The results with standard FD-ICA.

-- Separated signal of speaker A; wav


-- Separated signal of speaker B; wav


*[Our method] Results with 4 microphones
Results of our ICA which is an extended method of FD-ICA to deal with reverberations at STFT domain.

-- Separated signal of speaker A; wav


-- Separated signal of speaker B; wav


*[Our method] Results with 8 microphones
The performance improves, and the computational cost increases.

-- Separated signal of speaker A; wav


-- Separated signal of speaker B; wav


Another examples
Here (in japanese)

=== Blind dereverberation ===

Supression of reverberation of speech signal by using only observed signals captured by microphones.
Reverberation degrades the ASR performance seriously.

/*-- Example of speech dereverberation with ICA --*/

* We do not cope with the initial reflections which are easy to be solved by ASR techniques.

Conditions :
    1. Simulated data with real recorded impulse responses. (RT20 0.9[s])
    2. Without other noises
    3. Batch processing
    4. 30 iterations in estimating separation filter (with step-size adaptation)

* Example of the simulated data
-- Observed signal of 1st ch; wav


*[Baseline] Result with 4 microphones and FD-ICA

-- Dereverberated speech; wav

*[Our method] Results with 4 microphones
The result will be improved by using more samples for filter estimation.

-- Dereverberated speech ;wav


Copyright (C) 2009-2010 Ryu Takeda All Rights Reserved.