IEEE Humanoids 2009
Tutorial on Robot Audition

Organizers: Kazuhiro Nakadai $\ddag$ and Hiroshi G. Okuno $\dag$

$\ddag$ Honda Research Institute Japan Co., Ltd.

8-1 Honcho, Wako-shi, Saitama, 351-0114, Japan

nakadai _at_ jp.honda-ri.com

$\dag$ Department of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University

Yoshidahonchou, Sakyou-ku, Kyoto 606-8501, Japan

okuno _at_ i.kyoto-u.ac.jp

Ariel's Song, The Tempest, Shakespeare
Come unto these yellow sands, And then tale hands; Curt 'sied when you have, and kiss'd, (The wild waves whist;) Foot it featly here and there; And sweet sprites the burden bear. Burden dispersedly. Hark, hark! bowgh-wowgh: the watch-dogs bark, Bowgh-wowgh. Ariel. Hark, hark! I hear The strain of strutting chanticleer Cry cock-a-doodle-doo.

Background

Speech recognition is essential for communication and social interaction, and people with normal hearing capabilities can listen to many kinds of sounds under various acoustic conditions. Robots should have hearing capability equivalent to ours to realize human-robot communication and social interaction, when they are expected to help us in a daily environment. In such a daily environment, a lot of noise sources including robot's own motor noises besides target speech sources exist. Many robot systems for social interaction avoided this problem by forcing the attendants of interaction to wear a headset microphone. For smoother and more natural interactions, a robot should listen to sounds by its own ears instead of using attendants' headset microphones. Thus, we have proposed ``Robot Audition,'' which realizes recognition of noisy speech such as simultaneous speech by using robot-embedded microphones, at AAAI 2000. It has been studied actively for recent years, as typified by organized sessions on robot audition at IEEE/RSJ International Conferences on Intelligent Robots and Systems (IROS 2004-2009).

Open-Sourced Robot Audition Software HARK

Last year, by collecting software for producing our achievements related to robot audition, we released open-sourced robot audition software called HARK. HARK stands for Honda Research Institute Japan Audition for Robots with Kyoto University. HARK provides a complete module set to make a real-time robot audition system without any further implementation. Many multi-channel sound cards are supported to build a real-time robot audition system easily. Modules for preprocessing such as sound source localization, tracking and separation which were used in our reported systems are available. The modules for preprocessing are able to be integrated with automatic speech recognition based on the missing feature theory. Thanks to its module-based architecture, it has general applicability to various types of robots and hardware configurations.

Objective & Description

This tutorial describes the implementation of HARK including theoretical background of signal processing such as sound source localization and separation, and applications of HARK. In addition, each participant will have an opportunity to learn how to use HARK and how to implement a new module for HARK. Finally, we will show live demonstrations of HARK to demonstrate real performance on speech recognition in noisy environments, and general applicability of HARK. The tutorial will focus on the following two points: 1) Evaluation of a technique which each researcher is focusing on in a total robot audition system, and 2) Construction of a robot system with listening capability for researchers who are not familiar with robot audition. Therefore, we believe that this tutorial will be helpful for researchers in robotics who attend the IEEE-RAS International Conference on Humanoid Robots.

Intended participants

Intended audience will include

researchers on robot audition, audio signal processing, speech recognition who are actually working on a part of/whole robot audition research area,
researchers on human-robot interaction and human-robot communication who are interested in the performance of robot audition technology, and possibly introduce it to their robot systems,
developers of service robots, home robots, and humanoid robots who require robot audition for their robots.

Contents & Schedule

In our current plan, the tutorial will be organized as follows:

Introduction (Prof. Hiroshi G. Okuno) : 10 min
Overview of HARK (Dr. Kazuhiro Nakadai): 60 min
Introduction of HARK applications (Mr. Ryu Takeda, Mr. Takeshi Mizumoto): 30 min
Practice 0: system boot and setting (Prof. Toru Takahashi) : 30 min
Practice 1: construction of sound localization system (Mr. Takeshi Mizumoto): 90 min
Practice 2: the use of speech recognition system (Mr. Takeshi Mizumoto): 45 min
Practice 3: development of a new module (Mr. Ryu Takeda) : 60 min
Live demonstration (Prof. Toru Takahashi, Mr. Ryu Takeda, Mr. Takeshi Mizumoto and Mr. Takuma Ohtsuka) : 30 min

This is tentative schedule, thus, the contents can be changed without any notice.

Requirements for Participants

This tutorial includes practical lessons. Thus, we would like to ask each participant to have a laptop PC with him/her.

Specification of Laptop PC
- DVD bootable
- USB port available
- higher than Pentium M 1.6GHz (Core2Duo or higher is recommended).
- more than 1GB memory
- installed VMware player(free software) in advance

IEEE Humanoids 2009 Tutorial on Robot Audition