Demonstrations

Specification of SIG

Eyes [ MPEG1 10488ms 720x480 ]

Ears [ MPEG1 17664ms 720x480 ]

Active Auditon

Localiztion & tracking multiple sound source while in motion

- SIG listens to two pure tones of 500Hz & 600Hz with canceling motor noises in motion. The sound is captured by a pair of microphones in SIG. [ MPEG1 74074ms 352x240 ]

Real-time Multiple Speaker Tracking

Demonstration of Each Module

Sound Module [ MPEG1 5976ms 720x480 ] (Localize multiple harmonic sounds)

Face Module [ MPEG1 9984ms 720x480 ]

Localize and Recognize multiple faces

In case of Multiple Faces [ MPEG1 28536ms 720x480 ]

Stereo Vision Module [ MPEG1 41174ms 352x240 ]

(Localize human-like object)

Association Module [ MPEG1 21984ms 720x480 ]

Form auditory and visual streams and integrate them into an associated stream according to their proximity

Performance of Real-time Multiple Speaker Tracking System

Tracking 4 persons (not simultaneous) [ MPEG1 22776ms 720x480 ]

Tracking By Audition Only [ MPEG1 18072ms 720x480 ]

The sound stream (upper-left area) is not so accurate.

Tracking By Vision Only [ MPEG1 13056ms 720x480 ]

Tracking By Audio-Visual Integration [ MPEG1 12672ms 720x480 ]

Disambiguation each other: Missing visual information such as occulusion is disambiguated by auditory information, and ambiguous auditory information is disambiguated by accurate visual information.

Tracking By Stereo Vision [ MPEG1 35280ms 720x480 ]

track a person looking away

Tracking By Integration of 5 Modalities

Using sound direction, face ID, face location, stereo vision, speaker identification. [ MPEG1 52080ms 720x480 ]

Human-Robot Interaction

SIG as Paticipant In Conversation [ MPEG1 34968ms 720x480 ]

Even this kind of passive interaction creates an atmosphere as if the robot participates the conversation

SIG as Receptionist For Party

* Registered Visitor [ MPEG1 31968ms 720x480 ]

In case of a registered visitor, the robot accomplished a reception task by integration of face and voice of the visitor.

* Unregistered Visitor [ MPEG1 26952ms 720x480 ]

In case of an unregistered visitor, the robot registered the visitor's face and name by association based AV integration.

Simultaneous Speaker Tracking [ MPEG1 28632ms 704x480 ]

The robot listens to simultaneous speeches and pays attention to one of them by selecting a stream which the robot is interested in.

Tracking Stereo Sound by R-L balance Control [ MPEG1 24864ms 720x480 ]

The robot tracks virtual stereo sounds created by two loudspeaker. This is an evidence that the auditory processing of the robot is similar to human's.