Demonstrations

Specification of SIG

Eyes [ MPEG1 10488ms 720x480 ]
0ms
Ears [ MPEG1 17664ms 720x480 ]
0ms

Active Auditon

Localiztion & tracking multiple sound source while in motion

- SIG listens to two pure tones of 500Hz & 600Hz with canceling motor noises in motion. The sound is captured by a pair of microphones in SIG. [ MPEG1 74074ms 352x240 ]
0ms


Real-time Multiple Speaker Tracking


Demonstration of Each Module

Sound Module [ MPEG1 5976ms 720x480 ] (Localize multiple harmonic sounds)
0ms

Face Module [ MPEG1 9984ms 720x480 ]

Localize and Recognize multiple faces
0ms
In case of Multiple Faces [ MPEG1 28536ms 720x480 ]
0ms

Stereo Vision Module [ MPEG1 41174ms 352x240 ]

(Localize human-like object)
0ms

Association Module [ MPEG1 21984ms 720x480 ]

Form auditory and visual streams and integrate them into an associated stream according to their proximity
0ms


Performance of Real-time Multiple Speaker Tracking System

Tracking 4 persons (not simultaneous) [ MPEG1 22776ms 720x480 ]
0ms

Tracking By Audition Only [ MPEG1 18072ms 720x480 ]

The sound stream (upper-left area) is not so accurate.
0ms

Tracking By Vision Only [ MPEG1 13056ms 720x480 ]
0ms

Tracking By Audio-Visual Integration [ MPEG1 12672ms 720x480 ]

Disambiguation each other: Missing visual information such as occulusion is disambiguated by auditory information, and ambiguous auditory information is disambiguated by accurate visual information.
0ms

Tracking By Stereo Vision [ MPEG1 35280ms 720x480 ]

track a person looking away
0ms

Tracking By Integration of 5 Modalities

Using sound direction, face ID, face location, stereo vision, speaker identification. [ MPEG1 52080ms 720x480 ]
0ms


Human-Robot Interaction

SIG as Paticipant In Conversation [ MPEG1 34968ms 720x480 ]

Even this kind of passive interaction creates an atmosphere as if the robot participates the conversation
0ms

SIG as Receptionist For Party

* Registered Visitor [ MPEG1 31968ms 720x480 ]

In case of a registered visitor, the robot accomplished a reception task by integration of face and voice of the visitor.
0ms

* Unregistered Visitor [ MPEG1 26952ms 720x480 ]

In case of an unregistered visitor, the robot registered the visitor's face and name by association based AV integration.
0ms

Simultaneous Speaker Tracking [ MPEG1 28632ms 704x480 ]

The robot listens to simultaneous speeches and pays attention to one of them by selecting a stream which the robot is interested in.
0ms

Tracking Stereo Sound by R-L balance Control [ MPEG1 24864ms 720x480 ]

The robot tracks virtual stereo sounds created by two loudspeaker. This is an evidence that the auditory processing of the robot is similar to human's.
0ms

Locking Away [ MPEG1 36728ms 720x480 ]

By introducing hostile personality, the robot looks away when a person speaks to the robot.
0ms


Simultaneous Speech Recogniton

Japanese Version

* Recognition of Three Simultaneous Speeches 1 [ MPEG1 23424ms 720x480 ]
0ms

* Recognition of Three Simultaneous Speeches 2 [ MPEG1 33720ms 720x480 ]

Disambiguation By Asking Again : Active Audition
0ms

* Recognition of Three Simultaneous Speeches 3 [ MPEG1 30216ms 720x480 ]

Face Information makes the recongnition faster and more accurate.
0ms

* Recognition of Three Simultaneous Speeches 4 [ MPEG1 27432ms 720x480 ]

In total, the size of vocabulary is 150, so sound mixture of 3 fruit names can be recognized.
0ms

English Version

* Recognition of Three Simultaneous Speeches 1 [ MPEG2 34834ms 720x480 ]

By integration of face ID and speech recognition by speaker- and direction-dependent acoustic models, SIG recognizes simultaneous English speeches.
0ms

* Recognition of Three Simultaneous Speeches 2 [ MPEG2 40073ms 720x480 ]

By using only speech recognition, SIG recognizes simultaneous speeches to some extent.
0ms

* Recognition of Three Simultaneous Speeches 3 [ MPEG2 48815ms 720x480 ]

Disambiguation By Asking Again : Active Audition
0ms


* Recognition of Three Simultaneous Human Talkers in 2005 [ MPEG2 48815ms 720x480 ]


0ms AVI

* Faster Recognition of Three Simultaneous Human Talkers in 2006-2007 [ MPEG2 48815ms 720x480 ]


0ms


* Robot Referee of Rock-Paper-Scissors Sound Games in 2008
0ms


* Robot Steps by Recognizing the Beats of Music in 2007
0ms

This page is generated based on Area61.NET