Abstracts of Our Papers
FY2006
Instrogram: Probabilistic Representation of Instrument
Existence for Polyphonic Music
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
IPSJ Journal,
Vol.48, No.1, January 2007.
Instrument Identification in Polyphonic Music:
Feature Weighting to Minimize Influence of Sound Overlaps
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper provides a new solution to the problem of feature variations
caused by the overlapping of sounds in instrument identification in
polyphonic music. When multiple instruments simultaneously play,
partials (harmonic components) of their sounds overlap and interfere,
which makes the acoustic features different from those of monophonic sounds.
To cope with this, we weight features based on how much they are affected
by overlapping.
First, we quantitatively evaluate the influence of overlapping on each
feature as the ratio of the within-class variance to the between-class
variance in the distribution of training data obtained from polyphonic sounds.
Then, we generate feature axes using a weighted mixture that minimizes
the influence via linear discriminant analysis.
In addition, we improve instrument identification using
musical context.
Experimental results showed that the recognition rates using
both feature weighting and musical context were
84.1% for duo, 77.6% for trio, and 72.3% for quartet;
those without using either were 53.4, 49.6, and 46.5%, respectively.
EURASIP Journal on Applied Signal Processing,
Vol.2007, No.51979, pp.1--15, 2007.
Instrogram: A New Musical Instrument Recognition Technique
without Using Onset Detection nor F0 Estimation
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper describes a new technique for recognizing musical instruments
in polyphonic music. Because the conventional framework for musical
instrument recognition in polyphonic music had to estimate the onset time
and fundamental frequency (F0) of each note, instrument recognition strictly
suffered from errors of onset detection and F0 estimation.
Unlike such a note-based processing framework, our technique calculates
the temporal trajectory of instrument existence probabilities for
every possible F0, and the results are visualized with a spectrogram-like
graphical representation called instrogram.
The instrument existence probability is defined as the product of
a nonspecific instrument existence probability
calculated using PreFEst and
a conditional instrument existence probability
calculated using the hidden Markov model.
Experimental results show that the obtained instrograms reflect the actual
instrumentations and facilitate instrument recognition.
Proceedings of the 2006 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2006),
Vol.V, pp.229--232, May 2006.
F0 Estimation Method for Singing Voice in Polyphonic Audio Signal
based on Statistical Vocal Model and Viterbi Search
Hiromasa Fujihara,
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Proceedings of the 2006 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2006),
Vol.V, pp.253--256, May 2006.
FY2005
Pitch-dependent Identification of Musical Instrument Sounds
Tetsuro Kitahara,
Masataka Goto,
and
Hiroshi G. Okuno
Abstract
This paper describes a musical instrument identification method
that takes into consideration the pitch dependency of
timbres of musical instruments.
The difficulty in musical instrument identification resides in
the pitch dependency of musical instrument sounds,
that is, acoustic features of most musical instruments vary according to
the pitch (fundamental frequency, F0).
To cope with this difficulty,
we propose an F0-dependent multivariate normal distribution,
where each element of the mean vector is represented
by a function of F0.
Our method first extracts 129 features (e.g., the spectral centroid,
the gradient of the straight line approximating the power envelope) from
a musical instrument sound and then reduces the dimensionality
of the feature space into 18 dimension.
In the 18-dimensional feature space, it calculates an F0-dependent
mean function and an F0-normalized covariance, and finally
applies the Bayes decision rule.
Experimental results of identifying 6,247 solo tones of 19 musical
instruments shows that the proposed method improved the recognition rate
from 75.73% to 79.73%.
Applied Intelligence,
Vol.23, No.3, pp.267--275, December 2005.
Instrument Identification in Polyphonic Music:
Feature Weighting with Mixed Sounds, Pitch-dependent Timbre Modeling,
and Use of Musical Context
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper addresses the problem of identifying musical
instruments in polyphonic music. Musical instrument
identification (MII) is an improtant task in music information
retrieval because MII results make it possible to automatically
retrieving certain types of music (e.g., piano
sonata, string quartet). Only a few studies, however, have
dealt with MII in polyphonic music. In MII in polyphonic
music, there are three issues: feature variations caused
by sound mixtures, the pitch dependency of timbres, and
the use of musical context. For the first issue, templates
of feature vectors representing timbres are extracted from
not only isolated sounds but also sound mixtures. Because
some features are not robust in the mixtures, features
are weighted according to their robustness by using
linear discriminant analysis. For the second issue, we use
an F0-dependent multivariate normal distribution, which
approximates the pitch dependency as a function of fundamental
frequency. For the third issue, when the instrument
of each note is identified, the a priori probablity of the note
is calculated from the a posteriori probabilities of temporally
neighboring notes. Experimental results showed that
recognition rates were improved from 60.8% to 85.8% for
trio music and from 65.5% to 91.1% for duo music.
Proceedings of the 6th International Conference on
Music Information Retrieval
(ISMIR 2005),
pp.558--563, September 2005.
Singer Identification based on Accompaniment Sound Reduction and
Reliable Frame Selection
Hiromasa Fujihara,
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper describes a method for automatic singer identification
from polyphonic musical audio signals including
sounds of various instruments. Because singing voices
play an important role in musical pieces with a vocal part,
the identification of singer names is useful for music information
retrieval systems. The main problem in automatically
identifying singers is the negative influences caused
by accompaniment sounds. To solve this problem, we
developed two methods, accompaniment sound reduction
and reliable frame selection. The former method makes it
possible to identify the singer of a singing voice after reducing
accompaniment sounds. It first extracts harmonic
components of the predominant melody from sound mixtures
and then resynthesizes the melody by using a sinusoidal
model driven by those components. The latter
method then judges whether each frame of the obtained
melody is reliable (i.e. little influenced by accompaniment
sound) or not by using two Gaussian mixture models for
vocal and non-vocal frames. It enables the singer identification
using only reliable vocal portions of musical pieces.
Experimental results with forty popular-music songs by
ten singers showed that our method was able to reduce
the influences of accompaniment sounds and achieved an
accuracy of 95%, while the accuracy for a conventional
method was 53%.
Proceedings of the 6th International Conference on
Music Information Retrieval
(ISMIR 2005),
pp.329--336, September 2005.
ism: Improvisation Supporting Systems with Melody Correction
and Key Vibration
Tetsuro Kitahara,
Katsuhisa Ishida,
and
Masayuki Takeda
Abstract
This paper describes improvisation support for musicians
who do not have sufficient improvisational playing experience. The goal
of our study is to enable such players to learn the skills necessary for improvisation
and to enjoy it. In achieving this goal, we have two objectives:
enhancing their skill for instantaneous melody creation and supporting
their practice for acquiring this skill. For the first objective, we developed
a system that automatically corrects musically inappropriate notes
in the melodies of users' improvisations. For the second objective, we
developed a system that points out musically inappropriate notes by vibrating
corresponding keys. The main issue in developing these systems
is how to detect musically inappropriate notes. We propose a method for
detecting them based on the N-gram model. Experimental results show
that this N-gram-based method improves the accuracy of detecting musically
inappropriate notes and our systems are effective in supporting
unskilled musicians' improvisation.
Entertainment Computing:
Proceedings of the 4th International Conference on Entertainment
Computing (ICEC 2005),
Lecture Notes in Computer Science 3711, F. Kishino, Y. Kitamura, H. Kato and N. Nagata (Eds.), pp.315--327, September 2005.
FY2004
Automatic Chord Transcription with Concurrent Recognition of
Chord Symbols and Boundaries
Takuya Yoshioka,
Tetsuro Kitahara,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper describes a method that recognizes musical
chords from real-world audio signals in compact-disc
recordings. The automatic recognition of musical chords
is necessary for music information retrieval (MIR) systems,
since the chord sequences of musical pieces capture
the characteristics of their accompaniments. None
of the previous methods can accurately recognize musical
chords from complex audio signals that contain vocal
and drum sounds. The main problem is that the chordboundary-detection
and chord-symbol-identification processes
are inseparable because of their mutual dependency.
In order to solve this mutual dependency problem,
our method generates hypotheses about tuples of chord
symbols and chord boundaries, and outputs the most plausible
one as the recognition result. The certainty of a hypothesis
is evaluated based on three cues: acoustic features,
chord progression patterns, and bass sounds. Experimental
results show that our method successfully recognized
chords in seven popular music songs; the average
accuracy of the results was around 77%.
Proceedings of the 5th International Conference on
Music Information Retrieval
(ISMIR 2004),
pp.100--105, October 2004.
ism: Improvisation Supporting System based on Melody Correction
Katsuhisa Ishida,
Tetsuro Kitahara,
and
Masayuki Takeda
Abstract
In this paper, we describe a novel improvisation supporting
system based on correcting musically unnatural melodies.
Since improvisation is the musical performance style that
involves creating melodies while playing, it is not easy even
for the people who can play musical instruments. However,
previous studies have not dealt with improvisation support
for the people who can play musical instruments but cannot
improvise. In this study, to support such players' improvisation,
we propose a novel improvisation supporting system
called ism, which corrects musically unnatural melodies automatically.
The main issue in realizing this system is how
to detect notes to be corrected (i.e., musically unnatural or
inappropriate). We propose a method for detecting notes to
be corrected based on the N-gram model. This method first
calculates N-gram probabilities of played notes, and then
judges notes with low N-gram probabilities to be corrected.
Experimental results show that the N-gram-based melody
correction and the proposed system are useful for supporting
improvisation.
Proceedings of the International Conference on New Interfaces
for Musical Expression (NIME 04),
pp.177--180, June 2004.
Category-level Identification of Non-registered Musical Instrument Sounds
Tetsuro Kitahara,
Masataka Goto,
and
Hiroshi G. Okuno
Abstract
This paper describes a method that identifies sounds
of non-registered musical instruments (i.e., musical instruments
that are not contained in the training data) at a category
level. Although the problem of how to deal with
non-registered musical instruments is essential in musical
instrument identification, it has not been dealt with in previous
studies. Our method solves this problem by distinguishing
between registered and non-registered instruments
and identifying the category name of the non-registered instruments.
When a given sound is registered, its instrument
name, e.g. violin, is identified. Even if it is not registered, its
category name, e.g. strings, can be identified. The important
issue in achieving such identification is to adopt a musical
instrument hierarchy reflecting the acoustical similarity. We
present a method for acquiring such a hierarchy from a musical
instrument sound database. Experimental results show
that around 77% of non-registered instrument sounds, on
average, were correctly identified at the category level.
Proceedings of the 2004 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2004),
Vol.IV, pp.253--256, May 2004.
Comparing Features for Forming Music Streams
in Automatic Music Transcription
Yohei Sakuraba,
Tetsuro Kitahara,
and
Hiroshi G. Okuno
Abstract
In formating temporal sequences of notes played by the same instrument
(referred to as music streams), timbre of musical instruments
may be a predominant feature. In polyphonic music, the
performance of timber extraction based on power-related features
deteriorates, because such features are blurred when two or more
frequency components are superimposed in the same frequency.
To cope with this problem, we integrated timbre similarity and
direction proximity with success, but left using other features as
future work. In this paper, we investigate four features, timbre
similarity, direction proximity, pitch transition and pitch relation
consistency to clarify the precedence among them in music stream
formation. Experimental results with quartet music show that direction
proximity is themost dominant feature, and pitch transition
is the secondary. In addition, the performance of music stream
formation was improved from 63.3% by only timbre similarity to
84.9% by integrating four features.
Proceedings of the 2004 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2004),
Vol.IV, pp.273--376, May 2004.
FY2003
Musical Instrument Identification based on
F0-dependent Multivariate Normal Distribution
Tetsuro Kitahara,
Masataka Goto,
and
Hiroshi G. Okuno
Abstract
The pitch dependency of timbres has not been fully exploited in musical instrument identification. In this paper, we present a method
using an F0-dependent multivariate normal distribution of which mean is represented by a function of fundamental frequency (F0). This F0-dependent mean function represents the pitch dependency of each feature, while the F0-normalized covariance represents the non-pitch dependency. Musical instrument sounds are first analyzed by the F0-dependent multivariate normal distribution, and then identified by using the discriminant function based on the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments by 10-fold cross validation showed that the proposed method improved the recognition rate at individual-instrument level from 75.73% to 79.73%, and the recognition rate at category level from 88.20% to 90.65%.
Proceedings of the 2003 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2003),
Vol.V, pp.421--424, April 2003.