The same content is written in both English and Japanese below.
Instrogram is a spectrogram-like graphical representation of instrument existence probabilities, which are obtained by our musical instrument recognition technique. Because the conventional framework for musical instrument recognition in polyphonic music had to estimate the onset time and fundamental frequency (F0) of each note, instrument recognition greatly suffered from errors of onset detection and F0 estimation. Our technique, on the other hand, calculates the temporal trajectory of instrument existence probabilities at every possible F0 for each target instrument. The instrument existence probability is caclculated by multiplying a nonspecific instrument existence probability calculated using PreFEst and a conditional instrument existence probability calculated using hidden Markov models.
Instrogramは,我々の新たな楽器音認識技術によって得られる,楽器存在確率のスペクトログラムライクな視覚表現です.従来の多重奏に対する楽器音認識の枠組みでは,各単音(一音符に相当する一単位の音)の発音時刻や基本周波数(F0)を推定しなければならないため,これらの推定エラーからの悪影響が大きいという欠点がありました.それに対し,我々の技術では,これらの推定をせずに,あらゆるF0に対して楽器存在確率の時系列を求めます.楽器存在確率は,不特定楽器存在確率と条件付き楽器存在確率の積で表され,前者はPreFEstで,後者は隠れマルコフモデルで求めることができます.
Our paper on the instrogram technique submitted to IPSJ Journal has been accepted. This paper will also be published in IPSJ Digital Courier, an online journal where papers are freely available. Once it is published, I will make a link to the paper in this page.
IPSJ Journal(情報処理学会論文誌)に投稿した,我々のInstrogramに関する論文の採録が決定しました.この論文はオンラインジャーナルIPSJ Digital Courierにも掲載される予定です.掲載され次第,このページに論文へのリンクを掲載する予定です.
Our paper on the instrogram technique submitted to the 8th IEEE International Symposium on Multimedia (ISM 2006) has been accepted.
The 8th IEEE International Symposium on Multimedia (ISM 2006)に投稿した,我々のInstrogramに関する論文の採録が決定しました.
The instrogram is a spectrogram-like graphical representation of a musical audio signal, which is useful for finding which instruments are used in the signal. One image exists for each target instrument. Each image has horizontal and vertical axes representing time and frequency, and the intensity of the color of each point (t, f) shows the probability that the target instrument is used at time t and F0 f. Please see the figure entitled "Auld Lang Syne (FL-VN-PF)." This is an example of instrograms, where an audio signal of ``Auld Lang Syne'' played on the piano, violin, and flute is analyzed. The target instruments of analysis were piano, violin, clarinet, and flute. From the instrogram, we can see that this piece is played on flute, violin, and piano (no clarinet is played).
Instrogramは,どのような楽器による演奏かを容易に分かるようにした,スペクトログラムに似た音楽の視覚表現です.対象楽器の各々に1枚ずつ画像が存在し,各画像がその楽器が存在する確率を表します.各画像は,横軸が時刻,縦軸がF0を表し,各点(t, f)の色の強さによって,時刻tにおいて周波数fをF0とする当該楽器音が存在する確率を表します.下の"Auld Lang Syne (FL-VN-PF)"というタイトルの図を見てください.これは,ピアノ,バイオリン,フルートによる「蛍の光」の音響信号に対してInstrogramを求めたものです.分析の対象楽器は,ピアノ,バイオリン,クラリネット,フルートとしました.この図から,この楽曲がフルート,バイオリン,ピアノによる演奏であることを知ることができます.
Let {w1, ..., wm} be the set of target instruments.
Then, we have to calculate the probability p(wi; t, f), called the instrument existence probability (IEP), that the sound of the instrument wi with an F0 of f exists at time t for every target instrument wi. Making some assumptions (details are omitted), we can calculate the IEP as the product of two probabilities:
How to calculate Instrogram
対象楽器を {w1, ..., wm} とすると, ここで解くべき問題は,各wiに対するp(wi; t, f)の計算です.これは楽器存在確率と呼ばれ,時刻tにおいて,fをF0とする楽器wiの音が存在する確率を表します. いくつかの仮定(詳細は省略)をおくことで,楽器存在確率は,次のように2つの確率の積として計算することができます.
The following figure shows the overview of the algorithm for calculating an instrogram. Given an audio signal, the spectrogram is first calculated. The short-time Fourier transform (STFT) shifted by 10 ms (441 points at 44.1 kHz sampling) with an 8192-point Hamming window is used in the current implementation. Next, the NIEPs and CIEPs are calculated. The NIEPs are calculated by analyzing the power spectrum at each frame (timewise processing) using PreFEst. The PreFEst models the spectrum of a signal containing multiple sounds as a weighted mixture of harmonic-structure tone models at each frame. The CIEPs are, on the other hand, calculated by analyzing the temporal trajectory of the harmonic structure with every F0 (pitchwise processing). The trajectory is analyzed with a framework similar to speech recognition, based on left-to-right hidden Markov models (HMMs). Finally, the NIEPs and CIEPs are multiplied.
楽器存在確率の計算アルゴリズムの概要を下図に示します.まず,入力音響信号に対してスペクトログラムを作成します.現在の実装では,シフト幅10ms,窓幅8192点,ハミング窓の短時間フーリエ変換を用います.その後,不特定楽器存在確率と条件付き楽器存在確率を計算します.不特定楽器存在確率は,フレーム毎のパワースペクトルにPreFEstを適用して計算します.一方,条件付き楽器存在確率は,あらゆる周波数に対して,その周波数をF0とする調波構造の時系列を抽出し,これを隠れマルコフモデル(HMM)でモデル化して計算します.その後,不特定楽器存在確率と条件付き楽器存在確率の積をとります.
(ここに図)For detailed information, please see PDF files of our papers and posters, the links to which are at the bottom of this page.
詳細については,我々の論文やポスターのPDFファイル(リンクがこのページの一番下にあります)をご覧ください.
The following figure is an example of full-size (non-simplified) instrogram, which is the same as that shown in the paper. If you feel the figure in the paper is too small or unclear, please see the following. By clicking the figure, you can see it with the original size.
Auld Lang Syne (FL-VN-PF) |
---|
The following figures show the instrograms for 10 recordings of real performances taken from the RWC Music Database (Classical and Jazz). You can also listen to the excerpt used in the experiment.