Abstracts of Our Papers

FY2006

Instrogram: Probabilistic Representation of Instrument Existence for Polyphonic Music

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno

IPSJ Journal, Vol.48, No.1, January 2007.

Instrument Identification in Polyphonic Music: Feature Weighting to Minimize Influence of Sound Overlaps

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno

Abstract This paper provides a new solution to the problem of feature variations caused by the overlapping of sounds in instrument identification in polyphonic music. When multiple instruments simultaneously play, partials (harmonic components) of their sounds overlap and interfere, which makes the acoustic features different from those of monophonic sounds. To cope with this, we weight features based on how much they are affected by overlapping. First, we quantitatively evaluate the influence of overlapping on each feature as the ratio of the within-class variance to the between-class variance in the distribution of training data obtained from polyphonic sounds. Then, we generate feature axes using a weighted mixture that minimizes the influence via linear discriminant analysis. In addition, we improve instrument identification using musical context. Experimental results showed that the recognition rates using both feature weighting and musical context were 84.1% for duo, 77.6% for trio, and 72.3% for quartet; those without using either were 53.4, 49.6, and 46.5%, respectively.

EURASIP Journal on Applied Signal Processing, Vol.2007, No.51979, pp.1--15, 2007.

Instrogram: A New Musical Instrument Recognition Technique without Using Onset Detection nor F0 Estimation

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno

Abstract This paper describes a new technique for recognizing musical instruments in polyphonic music. Because the conventional framework for musical instrument recognition in polyphonic music had to estimate the onset time and fundamental frequency (F0) of each note, instrument recognition strictly suffered from errors of onset detection and F0 estimation. Unlike such a note-based processing framework, our technique calculates the temporal trajectory of instrument existence probabilities for every possible F0, and the results are visualized with a spectrogram-like graphical representation called instrogram. The instrument existence probability is defined as the product of a nonspecific instrument existence probability calculated using PreFEst and a conditional instrument existence probability calculated using the hidden Markov model. Experimental results show that the obtained instrograms reflect the actual instrumentations and facilitate instrument recognition.

Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), Vol.V, pp.229--232, May 2006.

F0 Estimation Method for Singing Voice in Polyphonic Audio Signal based on Statistical Vocal Model and Viterbi Search

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno

Proceedings of the 2006 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2006), Vol.V, pp.253--256, May 2006.

FY2005

Pitch-dependent Identification of Musical Instrument Sounds

Tetsuro Kitahara, Masataka Goto, and Hiroshi G. Okuno

Abstract This paper describes a musical instrument identification method that takes into consideration the pitch dependency of timbres of musical instruments. The difficulty in musical instrument identification resides in the pitch dependency of musical instrument sounds, that is, acoustic features of most musical instruments vary according to the pitch (fundamental frequency, F0). To cope with this difficulty, we propose an F0-dependent multivariate normal distribution, where each element of the mean vector is represented by a function of F0. Our method first extracts 129 features (e.g., the spectral centroid, the gradient of the straight line approximating the power envelope) from a musical instrument sound and then reduces the dimensionality of the feature space into 18 dimension. In the 18-dimensional feature space, it calculates an F0-dependent mean function and an F0-normalized covariance, and finally applies the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments shows that the proposed method improved the recognition rate from 75.73% to 79.73%.

Applied Intelligence, Vol.23, No.3, pp.267--275, December 2005.

Instrument Identification in Polyphonic Music: Feature Weighting with Mixed Sounds, Pitch-dependent Timbre Modeling, and Use of Musical Context

Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno

Abstract This paper addresses the problem of identifying musical instruments in polyphonic music. Musical instrument identification (MII) is an improtant task in music information retrieval because MII results make it possible to automatically retrieving certain types of music (e.g., piano sonata, string quartet). Only a few studies, however, have dealt with MII in polyphonic music. In MII in polyphonic music, there are three issues: feature variations caused by sound mixtures, the pitch dependency of timbres, and the use of musical context. For the first issue, templates of feature vectors representing timbres are extracted from not only isolated sounds but also sound mixtures. Because some features are not robust in the mixtures, features are weighted according to their robustness by using linear discriminant analysis. For the second issue, we use an F0-dependent multivariate normal distribution, which approximates the pitch dependency as a function of fundamental frequency. For the third issue, when the instrument of each note is identified, the a priori probablity of the note is calculated from the a posteriori probabilities of temporally neighboring notes. Experimental results showed that recognition rates were improved from 60.8% to 85.8% for trio music and from 65.5% to 91.1% for duo music.

Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), pp.558--563, September 2005.

Singer Identification based on Accompaniment Sound Reduction and Reliable Frame Selection

Hiromasa Fujihara, Tetsuro Kitahara, Masataka Goto, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno

Abstract This paper describes a method for automatic singer identification from polyphonic musical audio signals including sounds of various instruments. Because singing voices play an important role in musical pieces with a vocal part, the identification of singer names is useful for music information retrieval systems. The main problem in automatically identifying singers is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection. The former method makes it possible to identify the singer of a singing voice after reducing accompaniment sounds. It first extracts harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by those components. The latter method then judges whether each frame of the obtained melody is reliable (i.e. little influenced by accompaniment sound) or not by using two Gaussian mixture models for vocal and non-vocal frames. It enables the singer identification using only reliable vocal portions of musical pieces. Experimental results with forty popular-music songs by ten singers showed that our method was able to reduce the influences of accompaniment sounds and achieved an accuracy of 95%, while the accuracy for a conventional method was 53%.

Proceedings of the 6th International Conference on Music Information Retrieval (ISMIR 2005), pp.329--336, September 2005.

ism: Improvisation Supporting Systems with Melody Correction and Key Vibration

Tetsuro Kitahara, Katsuhisa Ishida, and Masayuki Takeda

Abstract This paper describes improvisation support for musicians who do not have sufficient improvisational playing experience. The goal of our study is to enable such players to learn the skills necessary for improvisation and to enjoy it. In achieving this goal, we have two objectives: enhancing their skill for instantaneous melody creation and supporting their practice for acquiring this skill. For the first objective, we developed a system that automatically corrects musically inappropriate notes in the melodies of users' improvisations. For the second objective, we developed a system that points out musically inappropriate notes by vibrating corresponding keys. The main issue in developing these systems is how to detect musically inappropriate notes. We propose a method for detecting them based on the N-gram model. Experimental results show that this N-gram-based method improves the accuracy of detecting musically inappropriate notes and our systems are effective in supporting unskilled musicians' improvisation.

Entertainment Computing: Proceedings of the 4th International Conference on Entertainment Computing (ICEC 2005), Lecture Notes in Computer Science 3711, F. Kishino, Y. Kitamura, H. Kato and N. Nagata (Eds.), pp.315--327, September 2005.

FY2004

Automatic Chord Transcription with Concurrent Recognition of Chord Symbols and Boundaries

Takuya Yoshioka, Tetsuro Kitahara, Kazunori Komatani, Tetsuya Ogata, and Hiroshi G. Okuno

Abstract This paper describes a method that recognizes musical chords from real-world audio signals in compact-disc recordings. The automatic recognition of musical chords is necessary for music information retrieval (MIR) systems, since the chord sequences of musical pieces capture the characteristics of their accompaniments. None of the previous methods can accurately recognize musical chords from complex audio signals that contain vocal and drum sounds. The main problem is that the chordboundary-detection and chord-symbol-identification processes are inseparable because of their mutual dependency. In order to solve this mutual dependency problem, our method generates hypotheses about tuples of chord symbols and chord boundaries, and outputs the most plausible one as the recognition result. The certainty of a hypothesis is evaluated based on three cues: acoustic features, chord progression patterns, and bass sounds. Experimental results show that our method successfully recognized chords in seven popular music songs; the average accuracy of the results was around 77%.

Proceedings of the 5th International Conference on Music Information Retrieval (ISMIR 2004), pp.100--105, October 2004.

ism: Improvisation Supporting System based on Melody Correction

Katsuhisa Ishida, Tetsuro Kitahara, and Masayuki Takeda

Abstract In this paper, we describe a novel improvisation supporting system based on correcting musically unnatural melodies. Since improvisation is the musical performance style that involves creating melodies while playing, it is not easy even for the people who can play musical instruments. However, previous studies have not dealt with improvisation support for the people who can play musical instruments but cannot improvise. In this study, to support such players' improvisation, we propose a novel improvisation supporting system called ism, which corrects musically unnatural melodies automatically. The main issue in realizing this system is how to detect notes to be corrected (i.e., musically unnatural or inappropriate). We propose a method for detecting notes to be corrected based on the N-gram model. This method first calculates N-gram probabilities of played notes, and then judges notes with low N-gram probabilities to be corrected. Experimental results show that the N-gram-based melody correction and the proposed system are useful for supporting improvisation.

Proceedings of the International Conference on New Interfaces for Musical Expression (NIME 04), pp.177--180, June 2004.

Category-level Identification of Non-registered Musical Instrument Sounds

Tetsuro Kitahara, Masataka Goto, and Hiroshi G. Okuno

Abstract This paper describes a method that identifies sounds of non-registered musical instruments (i.e., musical instruments that are not contained in the training data) at a category level. Although the problem of how to deal with non-registered musical instruments is essential in musical instrument identification, it has not been dealt with in previous studies. Our method solves this problem by distinguishing between registered and non-registered instruments and identifying the category name of the non-registered instruments. When a given sound is registered, its instrument name, e.g. violin, is identified. Even if it is not registered, its category name, e.g. strings, can be identified. The important issue in achieving such identification is to adopt a musical instrument hierarchy reflecting the acoustical similarity. We present a method for acquiring such a hierarchy from a musical instrument sound database. Experimental results show that around 77% of non-registered instrument sounds, on average, were correctly identified at the category level.

Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Vol.IV, pp.253--256, May 2004.

Comparing Features for Forming Music Streams in Automatic Music Transcription

Yohei Sakuraba, Tetsuro Kitahara, and Hiroshi G. Okuno

Abstract In formating temporal sequences of notes played by the same instrument (referred to as music streams), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timber extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features, timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is themost dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3% by only timbre similarity to 84.9% by integrating four features.

Proceedings of the 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2004), Vol.IV, pp.273--376, May 2004.

FY2003

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Tetsuro Kitahara, Masataka Goto, and Hiroshi G. Okuno

Abstract The pitch dependency of timbres has not been fully exploited in musical instrument identification. In this paper, we present a method using an F0-dependent multivariate normal distribution of which mean is represented by a function of fundamental frequency (F0). This F0-dependent mean function represents the pitch dependency of each feature, while the F0-normalized covariance represents the non-pitch dependency. Musical instrument sounds are first analyzed by the F0-dependent multivariate normal distribution, and then identified by using the discriminant function based on the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments by 10-fold cross validation showed that the proposed method improved the recognition rate at individual-instrument level from 75.73% to 79.73%, and the recognition rate at category level from 88.20% to 90.65%.

Proceedings of the 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2003), Vol.V, pp.421--424, April 2003.