発表論文概要
FY2006
Instrogram: Probabilistic Representation of Instrument
Existence for Polyphonic Music
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
IPSJ Journal,
Vol.48, No.1, January 2007.
Instrument Identification in Polyphonic Music:
Feature Weighting to Minimize Influence of Sound Overlaps
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper provides a new solution to the problem of feature variations
caused by the overlapping of sounds in instrument identification in
polyphonic music. When multiple instruments simultaneously play,
partials (harmonic components) of their sounds overlap and interfere,
which makes the acoustic features different from those of monophonic sounds.
To cope with this, we weight features based on how much they are affected
by overlapping.
First, we quantitatively evaluate the influence of overlapping on each
feature as the ratio of the within-class variance to the between-class
variance in the distribution of training data obtained from polyphonic sounds.
Then, we generate feature axes using a weighted mixture that minimizes
the influence via linear discriminant analysis.
In addition, we improve instrument identification using
musical context.
Experimental results showed that the recognition rates using
both feature weighting and musical context were
84.1% for duo, 77.6% for trio, and 72.3% for quartet;
those without using either were 53.4, 49.6, and 46.5%, respectively.
EURASIP Journal on Applied Signal Processing,
Vol.2007, No.51979, pp.1--15, 2007.
多重奏を対象とした音源同定:
混合音テンプレートを用いた音の重なりに頑健な特徴量の重みづけ
および音楽的文脈の利用
北原 鉄朗,
後藤 真孝,
駒谷 和範,
尾形 哲也,
奥乃 博
あらまし
本論文では,多重奏に対する音源同定において不可避な課題である
「音の重なりによる特徴変動」について新たな解決法を提案する.
多重奏では複数の楽器が同時に発音するため,各々の周波数成分が
重なって干渉し,音響的特徴が変動する.
本研究では,混合音から抽出した学習データに対して,
各特徴量のクラス内分散・クラス間分散比を求めることで,
周波数成分の重なりの影響の大きさを定量的に評価する.
そして,線形判別分析を用いることで,
これを最小化するように特徴量を重みづけした新たな特徴量軸を生成する.
これにより,周波数成分の重なりの影響をできるだけ小さくした特徴空間が得られる.
さらに,音楽的文脈を利用することで音源同定のさらなる高精度化を図る.
実楽器音データベースから作成した二〜四重奏の音響信号を用いた
実験により,二重奏では50.9%から84.1%へ,
三重奏では46.1%から77.6%へ,四重奏では43.1%から72.3%へ
認識率の改善を得,本手法の有効性を確認した.
電子情報通信学会論文誌,
Vol.J89-D, No.12, December 2006.
伴奏音抑制と高信頼度フレーム選択に基づく楽曲の歌手名同定手法
藤原 弘将,
北原 鉄朗,
後藤 真孝,
駒谷 和範,
尾形 哲也,
奥乃 博
あらまし
本論文では,実世界の音楽音響信号に対する歌手名の同定手法について述べる.歌手名の同定を行う際に大きな問題となるのは,混在する伴奏音の影響である.本論文ではこの問題を解決するため,伴奏音抑制と高信頼度フレーム選択の手法を提案する.前者では,優勢なメロディの調波構造を抽出し再合成することで,伴奏音の影響を低減させることができる.後者は,歌声と非歌声を表す2種類の混合正規分布を用いて,それぞれのフレームが歌声として信頼できるか否かを判定するものである.実験の結果,本手法によって,10歌手40曲に対して95%の識別率を達成し,本手法を用いない場合と比較して誤り率を約89%削減した.また,20歌手256曲に対する実験の結果,約93%の識別率を達成し,誤り率を約65%削減した.
情報処理学会論文誌,
Vol.47, No.6, pp.1831--1843, July 2006.
Instrogram: A New Musical Instrument Recognition Technique
without Using Onset Detection nor F0 Estimation
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper describes a new technique for recognizing musical instruments
in polyphonic music. Because the conventional framework for musical
instrument recognition in polyphonic music had to estimate the onset time
and fundamental frequency (F0) of each note, instrument recognition strictly
suffered from errors of onset detection and F0 estimation.
Unlike such a note-based processing framework, our technique calculates
the temporal trajectory of instrument existence probabilities for
every possible F0, and the results are visualized with a spectrogram-like
graphical representation called instrogram.
The instrument existence probability is defined as the product of
a nonspecific instrument existence probability
calculated using PreFEst and
a conditional instrument existence probability
calculated using the hidden Markov model.
Experimental results show that the obtained instrograms reflect the actual
instrumentations and facilitate instrument recognition.
Proceedings of the 2006 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2006),
Vol.V, pp.229--232, May 2006.
F0 Estimation Method for Singing Voice in Polyphonic Audio Signal
based on Statistical Vocal Model and Viterbi Search
Hiromasa Fujihara,
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Proceedings of the 2006 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2006),
Vol.V, pp.253--256, May 2006.
FY2005
Pitch-dependent Identification of Musical Instrument Sounds
Tetsuro Kitahara,
Masataka Goto,
and
Hiroshi G. Okuno
Abstract
This paper describes a musical instrument identification method
that takes into consideration the pitch dependency of
timbres of musical instruments.
The difficulty in musical instrument identification resides in
the pitch dependency of musical instrument sounds,
that is, acoustic features of most musical instruments vary according to
the pitch (fundamental frequency, F0).
To cope with this difficulty,
we propose an F0-dependent multivariate normal distribution,
where each element of the mean vector is represented
by a function of F0.
Our method first extracts 129 features (e.g., the spectral centroid,
the gradient of the straight line approximating the power envelope) from
a musical instrument sound and then reduces the dimensionality
of the feature space into 18 dimension.
In the 18-dimensional feature space, it calculates an F0-dependent
mean function and an F0-normalized covariance, and finally
applies the Bayes decision rule.
Experimental results of identifying 6,247 solo tones of 19 musical
instruments shows that the proposed method improved the recognition rate
from 75.73% to 79.73%.
Applied Intelligence,
Vol.23, No.3, pp.267--275, December 2005.
Instrument Identification in Polyphonic Music:
Feature Weighting with Mixed Sounds, Pitch-dependent Timbre Modeling,
and Use of Musical Context
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper addresses the problem of identifying musical
instruments in polyphonic music. Musical instrument
identification (MII) is an improtant task in music information
retrieval because MII results make it possible to automatically
retrieving certain types of music (e.g., piano
sonata, string quartet). Only a few studies, however, have
dealt with MII in polyphonic music. In MII in polyphonic
music, there are three issues: feature variations caused
by sound mixtures, the pitch dependency of timbres, and
the use of musical context. For the first issue, templates
of feature vectors representing timbres are extracted from
not only isolated sounds but also sound mixtures. Because
some features are not robust in the mixtures, features
are weighted according to their robustness by using
linear discriminant analysis. For the second issue, we use
an F0-dependent multivariate normal distribution, which
approximates the pitch dependency as a function of fundamental
frequency. For the third issue, when the instrument
of each note is identified, the a priori probablity of the note
is calculated from the a posteriori probabilities of temporally
neighboring notes. Experimental results showed that
recognition rates were improved from 60.8% to 85.8% for
trio music and from 65.5% to 91.1% for duo music.
Proceedings of the 6th International Conference on
Music Information Retrieval
(ISMIR 2005),
pp.558--563, September 2005.
Singer Identification based on Accompaniment Sound Reduction and
Reliable Frame Selection
Hiromasa Fujihara,
Tetsuro Kitahara,
Masataka Goto,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper describes a method for automatic singer identification
from polyphonic musical audio signals including
sounds of various instruments. Because singing voices
play an important role in musical pieces with a vocal part,
the identification of singer names is useful for music information
retrieval systems. The main problem in automatically
identifying singers is the negative influences caused
by accompaniment sounds. To solve this problem, we
developed two methods, accompaniment sound reduction
and reliable frame selection. The former method makes it
possible to identify the singer of a singing voice after reducing
accompaniment sounds. It first extracts harmonic
components of the predominant melody from sound mixtures
and then resynthesizes the melody by using a sinusoidal
model driven by those components. The latter
method then judges whether each frame of the obtained
melody is reliable (i.e. little influenced by accompaniment
sound) or not by using two Gaussian mixture models for
vocal and non-vocal frames. It enables the singer identification
using only reliable vocal portions of musical pieces.
Experimental results with forty popular-music songs by
ten singers showed that our method was able to reduce
the influences of accompaniment sounds and achieved an
accuracy of 95%, while the accuracy for a conventional
method was 53%.
Proceedings of the 6th International Conference on
Music Information Retrieval
(ISMIR 2005),
pp.329--336, September 2005.
ism: Improvisation Supporting Systems with Melody Correction
and Key Vibration
Tetsuro Kitahara,
Katsuhisa Ishida,
and
Masayuki Takeda
Abstract
This paper describes improvisation support for musicians
who do not have sufficient improvisational playing experience. The goal
of our study is to enable such players to learn the skills necessary for improvisation
and to enjoy it. In achieving this goal, we have two objectives:
enhancing their skill for instantaneous melody creation and supporting
their practice for acquiring this skill. For the first objective, we developed
a system that automatically corrects musically inappropriate notes
in the melodies of users' improvisations. For the second objective, we
developed a system that points out musically inappropriate notes by vibrating
corresponding keys. The main issue in developing these systems
is how to detect musically inappropriate notes. We propose a method for
detecting them based on the N-gram model. Experimental results show
that this N-gram-based method improves the accuracy of detecting musically
inappropriate notes and our systems are effective in supporting
unskilled musicians' improvisation.
Entertainment Computing:
Proceedings of the 4th International Conference on Entertainment
Computing (ICEC 2005),
Lecture Notes in Computer Science 3711, F. Kishino, Y. Kitamura, H. Kato and N. Nagata (Eds.), pp.315--327, September 2005.
N-gramによる旋律の音楽的適否判定に基づいた即興演奏支援システム
石田 克久,
北原 鉄朗,
武田 正之
あらまし
本論文では,即興演奏未習得者のための演奏支援について述べる.我々の最終目標は,即興演奏未習得者が通常の楽器を用いて即興演奏を行えるようになることである.この目標を達成するために,我々は「即時的旋律創作能力の補助」と「即興演奏の練習環境の提供」の2つのアプローチで,即興演奏の未習得者をサポートする.「即時的旋律創作能力の補助」に対しては,旋律中の不適切な音を自動的に補正する演奏支援システムismを開発した.これは,演奏された旋律中の不自然な個所をリアルタイムに検出し,適切な音に変換することで,即時的な旋律創作を容易にするためのものである.「即興演奏の練習環境の提供」に対しては振動により不適切な音を指摘する学習支援システムismvを構築した.このような支援システムを実現するうえで中心となる課題は,どのように不適切な音を検出するかである.これに対し我々は,N-gramで旋律をモデル化し,その確率値が小さなもののみを不適切と判定する手法を提案する.実験の結果,提案手法により旋律中の不適切な個所の検出精度を上させることができ,ism/ismvが即興未習得者の演奏支援に有効であることが示された.
情報処理学会論文誌,
Vol.46, No.7, pp.1549--1559, July 2005.
FY2004
Automatic Chord Transcription with Concurrent Recognition of
Chord Symbols and Boundaries
Takuya Yoshioka,
Tetsuro Kitahara,
Kazunori Komatani,
Tetsuya Ogata,
and
Hiroshi G. Okuno
Abstract
This paper describes a method that recognizes musical
chords from real-world audio signals in compact-disc
recordings. The automatic recognition of musical chords
is necessary for music information retrieval (MIR) systems,
since the chord sequences of musical pieces capture
the characteristics of their accompaniments. None
of the previous methods can accurately recognize musical
chords from complex audio signals that contain vocal
and drum sounds. The main problem is that the chordboundary-detection
and chord-symbol-identification processes
are inseparable because of their mutual dependency.
In order to solve this mutual dependency problem,
our method generates hypotheses about tuples of chord
symbols and chord boundaries, and outputs the most plausible
one as the recognition result. The certainty of a hypothesis
is evaluated based on three cues: acoustic features,
chord progression patterns, and bass sounds. Experimental
results show that our method successfully recognized
chords in seven popular music songs; the average
accuracy of the results was around 77%.
Proceedings of the 5th International Conference on
Music Information Retrieval
(ISMIR 2004),
pp.100--105, October 2004.
ism: Improvisation Supporting System based on Melody Correction
Katsuhisa Ishida,
Tetsuro Kitahara,
and
Masayuki Takeda
Abstract
In this paper, we describe a novel improvisation supporting
system based on correcting musically unnatural melodies.
Since improvisation is the musical performance style that
involves creating melodies while playing, it is not easy even
for the people who can play musical instruments. However,
previous studies have not dealt with improvisation support
for the people who can play musical instruments but cannot
improvise. In this study, to support such players' improvisation,
we propose a novel improvisation supporting system
called ism, which corrects musically unnatural melodies automatically.
The main issue in realizing this system is how
to detect notes to be corrected (i.e., musically unnatural or
inappropriate). We propose a method for detecting notes to
be corrected based on the N-gram model. This method first
calculates N-gram probabilities of played notes, and then
judges notes with low N-gram probabilities to be corrected.
Experimental results show that the N-gram-based melody
correction and the proposed system are useful for supporting
improvisation.
Proceedings of the International Conference on New Interfaces
for Musical Expression (NIME 04),
pp.177--180, June 2004.
Category-level Identification of Non-registered Musical Instrument Sounds
Tetsuro Kitahara,
Masataka Goto,
and
Hiroshi G. Okuno
Abstract
This paper describes a method that identifies sounds
of non-registered musical instruments (i.e., musical instruments
that are not contained in the training data) at a category
level. Although the problem of how to deal with
non-registered musical instruments is essential in musical
instrument identification, it has not been dealt with in previous
studies. Our method solves this problem by distinguishing
between registered and non-registered instruments
and identifying the category name of the non-registered instruments.
When a given sound is registered, its instrument
name, e.g. violin, is identified. Even if it is not registered, its
category name, e.g. strings, can be identified. The important
issue in achieving such identification is to adopt a musical
instrument hierarchy reflecting the acoustical similarity. We
present a method for acquiring such a hierarchy from a musical
instrument sound database. Experimental results show
that around 77% of non-registered instrument sounds, on
average, were correctly identified at the category level.
Proceedings of the 2004 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2004),
Vol.IV, pp.253--256, May 2004.
Comparing Features for Forming Music Streams
in Automatic Music Transcription
Yohei Sakuraba,
Tetsuro Kitahara,
and
Hiroshi G. Okuno
Abstract
In formating temporal sequences of notes played by the same instrument
(referred to as music streams), timbre of musical instruments
may be a predominant feature. In polyphonic music, the
performance of timber extraction based on power-related features
deteriorates, because such features are blurred when two or more
frequency components are superimposed in the same frequency.
To cope with this problem, we integrated timbre similarity and
direction proximity with success, but left using other features as
future work. In this paper, we investigate four features, timbre
similarity, direction proximity, pitch transition and pitch relation
consistency to clarify the precedence among them in music stream
formation. Experimental results with quartet music show that direction
proximity is themost dominant feature, and pitch transition
is the secondary. In addition, the performance of music stream
formation was improved from 63.3% by only timbre similarity to
84.9% by integrating four features.
Proceedings of the 2004 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2004),
Vol.IV, pp.273--376, May 2004.
FY2003
音響的類似性を反映した楽器の階層表現の獲得と
それに基づく未知楽器のカテゴリーレベルの音源同定
北原 鉄朗,
後藤 真孝,
奥乃 博
あらまし
本論文では,音響的特徴から得られる楽器の階層表現に基づいた未知楽器(学習データに含まれない楽器)のカテゴリーレベルの音源同定について述べる.未知の楽器をどのように扱うかという問題は,楽器音の音源同定において不可避な問題であるにも関わらず,これまでの研究では扱われてこなかった.本研究では,未知の楽器をカテゴリーレベルで認識することを提案する.まず,未知楽器のカテゴリーレベルの認識に適した楽器の階層表現を自動的に獲得する手法について述べ,この手法に基づいて得られた楽器の階層表現を用いて,未知の楽器のカテゴリーレベルの認識を行う.さらに,楽器音が既知か未知か(すなわち,学習データに含まれる楽器か否か)を判定する処理を導入することで,既知の楽器は楽器名レベルで,未知の楽器はカテゴリーレベルで認識することを実現する.実験の結果,平均約77%の未知の楽器音をカテゴリーレベルで認識することができた.
情報処理学会論文誌,
Vol.45, No.3, pp.680--689, March 2004.
音高による音色変化に着目した楽器音の音源同定:
F0依存多次元正規分布に基づく識別手法
北原 鉄朗,
後藤 真孝,
奥乃 博
あらまし
本論文では,音高による音色変化を考慮する楽器音の音源同定手法を提案する.楽器音の音色が音高によって変化することは,従来から広く知られているにも関わらず,これを適切に扱える音源同定手法については,研究されてこなかった.本論文では,音高による音色変化を適切に扱うため,平均が基本周波数によって変化する多次元正規分布を提案する.そして,音色空間(楽器音の特徴空間)上で各楽器音データがこの分布に従うと仮定し,この分布のための識別関数をベイズ決定規則から定式化する.提案手法を実装・実験した結果,音高による音色変化を考慮しない多次元正規分布を用いた場合の誤認識全体のうち,個々の楽器レベルでは16.48\%,カテゴリーレベルでは20.67\%の誤認識を削減することができた.
情報処理学会論文誌,
Vol.44, No.10, pp.2448--2458, October 2003.
Musical Instrument Identification based on
F0-dependent Multivariate Normal Distribution
Tetsuro Kitahara,
Masataka Goto,
and
Hiroshi G. Okuno
Abstract
The pitch dependency of timbres has not been fully exploited in musical instrument identification. In this paper, we present a method
using an F0-dependent multivariate normal distribution of which mean is represented by a function of fundamental frequency (F0). This F0-dependent mean function represents the pitch dependency of each feature, while the F0-normalized covariance represents the non-pitch dependency. Musical instrument sounds are first analyzed by the F0-dependent multivariate normal distribution, and then identified by using the discriminant function based on the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments by 10-fold cross validation showed that the proposed method improved the recognition rate at individual-instrument level from 75.73% to 79.73%, and the recognition rate at category level from 88.20% to 90.65%.
Proceedings of the 2003 IEEE International Conference
on Acoustics, Speech, and Signal Processing
(ICASSP 2003),
Vol.V, pp.421--424, April 2003.