Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University

Tetsuro Kitahara

See also our Japanese papers if you can understand Japanese.

FY2005

Instrument Identification in Polyphonic Music: Feature Weighting with Mixed Sounds, Pitch-dependent Timbre Modeling, and Use of Musical Context

Tetsuro Kitahara^* Masataka Goto^** Kazunori Komatani^* Tetsuya Ogata^* Hiroshi G. Okuno^*
^* Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
^** National Institute of Advanced Industrial Science and Technology (AIST)

Abstract This paper addresses the problem of identifying musical instruments in polyphonic music. Musical instrument identification (MII) is an improtant task in music information retrieval because MII results make it possible to automatically retrieving certain types of music (e.g., piano sonata, string quartet). Only a few studies, however, have dealt with MII in polyphonic music. In MII in polyphonic music, there are three issues: feature variations caused by sound mixtures, the pitch dependency of timbres, and the use of musical context. For the first issue, templates of feature vectors representing timbres are extracted from not only isolated sounds but also sound mixtures. Because some features are not robust in the mixtures, features are weighted according to their robustness by using linear discriminant analysis. For the second issue, we use an F0-dependent multivariate normal distribution, which approximates the pitch dependency as a function of fundamental frequency. For the third issue, when the instrument of each note is identified, the a priori probablity of the note is calculated from the a posteriori probabilities of temporally neighboring notes. Experimental results showed that recognition rates were improved from 60.8% to 85.8% for trio music and from 65.5% to 91.1% for duo music.

Proc. of the 2004 IEEE Int'l Conf. on Music Information Retrieval (ISMIR 2005), pp.558--563, Septermber 2005.
[Download PDF]

ism: Improvisation Supporting Systems with Melody Correction and Key Vibration

Tetsuro Kitahara^* Katsuhisa Ishida^** Masayuki Takeda^**
^* Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
^** Dept. of Information Sciences, Tokyo University of Science

Abstract This paper describes improvisation support for musicians who do not have sufficient improvisational playing experience. The goal of our study is to enable such players to learn the skills necessary for improvisation and to enjoy it. In achieving this goal, we have two objectives: enhancing their skill for instantaneous melody creation and supporting their practice for acquiring this skill. For the first objective, we developed a system that automatically corrects musically inappropriate notes in the melodies of users improvisations. For the second objective, we developed a system that points out musically inappropriate notes by vibrating corresponding keys. The main issue in developing these systems is how to detect musically inappropriate notes. We propose a method for detecting them based on the N-gram model. Experimental results show that this N-gram-based method improves the accuracy of detecting musically inappropriate notes and our systems are effective in supporting unskilled musicians' improvisation.

Entertainment Computing, Lecture Notes in Computer Science 3711, Proceedings of the 4th International Conference on Entertainment Computing (ICEC 2005), pp.315--327, Springer, September 2005.
[Download PDF]

Singer Identification based on Accompaniment Sound Reduction and Reliable Frame Selection

Hiromasa Fujihara^* Tetsuro Kitahara^* Masataka Goto^** Kazunori Komatani^* Tetsuya Ogata^* Hiroshi G. Okuno^*
^* Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
^** National Institute of Advanced Industrial Science and Technology (AIST)

Abstract This paper describes a method for automatic singer identification from polyphonic musical audio signals including sounds of various instruments. Because singing voices play an important role in musical pieces with a vocal part, the identification of singer names is useful for music information retrieval systems. The main problem in automatically identifying singers is the negative influences caused by accompaniment sounds. To solve this problem, we developed two methods, accompaniment sound reduction and reliable frame selection. The former method makes it possible to identify the singer of a singing voice after reducing accompaniment sounds. It first extracts harmonic components of the predominant melody from sound mixtures and then resynthesizes the melody by using a sinusoidal model driven by those components. The latter method then judges whether each frame of the obtained melody is reliable (i.e. little influenced by accompaniment sound) or not by using two Gaussian mixture models for vocal and non-vocal frames. It enables the singer identifi- cation using only reliable vocal portions of musical pieces. Experimental results with forty popular-music songs by ten singers showed that our method was able to reduce the influences of accompaniment sounds and achieved an accuracy of 95%, while the accuracy for a conventional method was 53%.

Proc. of the 2004 IEEE Int'l Conf. on Music Information Retrieval (ISMIR 2005), pp.558--563, Septermber 2005.
[Download PDF]

FY2004

Category-level Identification of Non-registered Musical Instrument Sounds

Tetsuro Kitahara^* Masataka Goto^** Hiroshi G. Okuno^*
^* Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
^** National Institute of Advanced Industrial Science and Technology (AIST)

Abstract This paper describes a method that identifies sounds of non-registered musical instruments (i.e., musical instruments that are not contained in the training data) at a category level. Although the problem of how to deal with non-registered musical instruments is essential in musical instrument identification, it has not been dealt with in previous studies. Our method solves this problem by distinguishing between registered and non-registered instruments and identifying the category name of the non-registered instruments. When a given sound is registered, its instrument name, e.g. violin, is identified. Even if it is not registered, its category name, e.g. strings, can be identified. The important issue in achieving such identification is to adopt a musical instrument hierarchy reflecting the acoustical similarity. We present a method for acquiring such a hierarchy from a musical instrument sound database. Experimental results show that around 77% of non-registered instrument sounds, on average, were correctly identified at the category level.

Proc. of the 2004 IEEE Int'l Conf. on Acoustic, Speech and Signal Processing (ICASSP '04), Vol.IV, pp.253--256, May 2004.
[Download PDF]

Comparing Features for Forming Music Streams in Automatic Music Transcription

Yohei Sakuraba^* Tetsuro Kitahara^* Hiroshi G. Okuno^*
^* Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University

Abstract In formating temporal sequences of notes played by the same instrument (referred to as music streams), timbre of musical instruments may be a predominant feature. In polyphonic music, the performance of timber extraction based on power-related features deteriorates, because such features are blurred when two or more frequency components are superimposed in the same frequency. To cope with this problem, we integrated timbre similarity and direction proximity with success, but left using other features as future work. In this paper, we investigate four features, timbre similarity, direction proximity, pitch transition and pitch relation consistency to clarify the precedence among them in music stream formation. Experimental results with quartet music show that direction proximity is themost dominant feature, and pitch transition is the secondary. In addition, the performance of music stream formation was improved from 63.3% by only timbre similarity to 84.9% by integrating four features.

Proc. of the 2004 IEEE Int'l Conf. on Acoustic, Speech and Signal Processing (ICASSP '04), Vol.IV, pp.273--276, May 2004.
[Download PDF]

ism: Improvisation Supporting System based on Melody Correction

Katsuhisa Ishida^* Tetsuro Kitahara^** Masayuki Takeda^***
^* Dept. of Information Sciences, Graduate School of Sciences and Technology, Tokyo University of Science
^** Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
^*** Dept. of Information Sciences, Faculty of Sciences and Technology, Tokyo University of Science

Abstract In this paper, we describe a novel improvisation supporting system based on correcting musically unnatural melodies. Since improvisation is the musical performance style that involves creating melodies while playing, it is not easy even for the people who can play musical instruments. However, previous studies have not dealt with improvisation support for the people who can play musical instruments but cannot improvise. In this study, to support such players improvisation, we propose a novel improvisation supporting system called ism, which corrects musically unnatural melodies automatically. The main issue in realizing this system is how to detect notes to be corrected (i.e., musically unnatural or inappropriate). We propose a method for detecting notes to be corrected based on the N-gram model. This method rst calculates N-gram probabilities of played notes, and then judges notes with low N-gram probabilities to be corrected. Experimental results show that the N-gram-based melody correction and the proposed system are useful for supporting improvisation.

Proc. of the Int'l Conf. on New Interfaces for Musical Expression (NIME 04), pp.177--180, June 2004.
[Download PDF]

Automatic Chord Transcription with Concurrent Recognition of Chord Symbols and Boundaries

Takuya Yoshioka^* Tetsuro Kitahara^* Kazunori Komatani^* Tetsuya Ogata^* Hiroshi G. Okuno^*
^* Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University

Abstract This paper describes a method that recognizes musical chords from real-world audio signals in compact-disc recordings. The automatic recognition of musical chords is necessary for music information retrieval (MIR) systems, since the chord sequences of musical pieces capture the characteristics of their accompaniments. None of the previous methods can accurately recognize musical chords from complex audio signals that contain vocal and drum sounds. The main problem is that the chordboundary- detection and chord-symbol-identification processes are inseparable because of their mutual dependency. In order to solve this mutual dependency problem, our method generates hypotheses about tuples of chord symbols and chord boundaries, and outputs the most plausible one as the recognition result. The certainty of a hypothesis is evaluated based on three cues: acoustic features, chord progression patterns, and bass sounds. Experimental results show that our method successfully recognized chords in seven popular music songs; the average accuracy of the results was around 77%.

Proc. of the 2004 IEEE Int'l Conf. on Music Information Retrieval (ISMIR 2004), pp.100--105, October 2004.
[Download PDF]

FY2003

Musical Instrument Identification based on F0-dependent Multivariate Normal Distribution

Tetsuro Kitahara^* Masataka Goto^** Hiroshi G. Okuno^*
^* Dept. of Intelligence Science and Technology, Graduate School of Informatics, Kyoto University
^** "Information and Human Activity", PRESTO JST / National Institute of Advanced Industrial Science and Technology

Abstract The pitch dependency of timbres has not been fully exploited in musical instrument identification. In this paper, we present a method using an F0-dependent multivariate normal distribution of which mean is represented by a function of fundamental frequency (F0). This F0-dependent mean function represents the pitch dependency of each feature, while the F0-normalized covariance represents the non-pitch dependency. Musical instrument sounds are first analyzed by the F0-dependent multivariate normal distribution, and then identified by using the discriminant function based on the Bayes decision rule. Experimental results of identifying 6,247 solo tones of 19 musical instruments by 10-fold cross validation showed that the proposed method improved the recognition rate at individual-instrument level from 75.73% to 79.73%, and the recognition rate at category level from 88.20% to 90.65%.

Proc. of the 2003 IEEE Int'l Conf. on Acoustic, Speech and Signal Processing (ICASSP '03), Vol.V, pp.421-424, Apr. 2003.
[Download PDF]

Return to home or okuno-lab.
Click here if you wanna know my e-mail address.