Come unto these yellow sands, And then tale hands; Curt 'sied when you have, and kiss'd, (The wild waves whist;) Foot it featly here and there; And sweet sprites the burden bear. Burden dispersedly. Hark, hark! bowgh-wowgh: the watch-dogs bark, Bowgh-wowgh. Ariel. Hark, hark! I hear The strain of strutting chanticleer Cry cock-a-doodle-doo.
Speech recognition is essential for communication and social interaction, and people with normal hearing capabilities can listen to many kinds of sounds under various acoustic conditions. Robots should have hearing capability equivalent to ours to realize human-robot communication and social interaction, when they are expected to help us in a daily environment. In such a daily environment, a lot of noise sources including robot's own motor noises besides target speech sources exist. Many robot systems for social interaction avoided this problem by forcing the attendants of interaction to wear a headset microphone. For smoother and more natural interactions, a robot should listen to sounds by its own ears instead of using attendants' headset microphones. Thus, we have proposed ``Robot Audition,'' which realizes recognition of noisy speech such as simultaneous speech by using robot-embedded microphones, at AAAI 2000. It has been studied actively for recent years, as typified by organized sessions on robot audition at IEEE/RSJ International Conferences on Intelligent Robots and Systems (IROS 2004-2009).
Last year, by collecting software for producing our achievements related to robot audition, we released open-sourced robot audition software called HARK. HARK stands for Honda Research Institute Japan Audition for Robots with Kyoto University. HARK provides a complete module set to make a real-time robot audition system without any further implementation. Many multi-channel sound cards are supported to build a real-time robot audition system easily. Modules for preprocessing such as sound source localization, tracking and separation which were used in our reported systems are available. The modules for preprocessing are able to be integrated with automatic speech recognition based on the missing feature theory. Thanks to its module-based architecture, it has general applicability to various types of robots and hardware configurations.
This tutorial describes the implementation of HARK including theoretical background of signal processing such as sound source localization and separation, and applications of HARK. In addition, each participant will have an opportunity to learn how to use HARK and how to implement a new module for HARK. Finally, we will show live demonstrations of HARK to demonstrate real performance on speech recognition in noisy environments, and general applicability of HARK. The tutorial will focus on the following two points: 1) Evaluation of a technique which each researcher is focusing on in a total robot audition system, and 2) Construction of a robot system with listening capability for researchers who are not familiar with robot audition. Therefore, we believe that this tutorial will be helpful for researchers in robotics who attend the IEEE-RAS International Conference on Humanoid Robots.
Intended audience will include
This tutorial includes practical lessons. Thus, we would like to ask each participant to have a laptop PC with him/her.