Abstract |
We present novel probabilistic models that can be used to estimate multiple fundamental frequencies (F0s) from polyphonic audio signals. These models are nonparametric Bayesian extensions of nonnegative matrix factorization (NMF) based on the source-filter paradigm, and in them an amplitude or power spectrogram is decomposed as the product of two kinds of spectral atoms (sources and filters) and time-varying gains of source-filter pairs. In this study we model musical instruments as autoregressive systems that combine two types of sources—periodic signals (comb-shaped densities) and white noise (flat density)—with all-pole filters representing resonance characteristics. One of the main problems with such composite autoregressive models (CARMs) is that the numbers of sources and filters should be given in advance. To solve this problem, we propose nonparametric Bayesian models based on gamma processes and efficient variational and multiplicative learning algorithms. These infinite CARMs (iCARMs) can discover appropriate numbers of sources and filters in a data-driven manner. We report the experimental results of multipitch analysis on the MAPS piano database.
Problems |
The standard NMF approximates an amplitude or power spectrogram (nonnegative matrix) as the product of two nonnegative matrices, one of which is a compact set of spectral bases and the other of which is a set of the corresponding time-varying gains. The standard NMF, however, has three fundamental limitations:
Solutions |
We propose infinite composite autoregressive models (iCARMs) by fusing the following techniques into a unified Bayesian framework:
Demos |
We present some audio examples of source-filter decomposition for popular music in a completely unsupervised setting (iCARMs have no sensitive hyperparameters that should be tuned carefully). This might be the most challenging task in the field of sound source separation. Note that the volumes of MP3s are maximized independently. In many songs, the most significant filter (filter 0) corresponds to the bass guitar, and the second filter (filter 1) to the vocal part. Percussive sounds and other instrument sounds are incompletely captured by several filters. Although these results are far from satisfactory, we believe that extending iCARMs is a very promising direction of research on sound source separation.
RWC-MDB-P-2001 No.1 | ||
Original | ||
Filter 0 (bass guitar) |
||
Filter 1 (vocal) |
||
Filter 2 (lead guitar) |
||
Filter 3 (snare drum) |
||
Filter 4 (vocal) |
||
Filter 5 | ||
Filter 6 | ||
Filter 7 | ||
Filter 8 | ||
Filter 9 |
RWC-MDB-P-2001 No.5 | ||
Original | ||
Filter 0 (bass guitar) |
||
Filter 1 (vocal) |
||
Filter 2 (vocal?) |
||
Filter 3 (snare drum) |
||
Filter 4 | ||
Filter 5 | ||
Filter 6 | ||
Filter 7 | ||
Filter 8 | ||
Filter 9 |
Those popular songs were sampled from RWC Music Database: RWC-MDB-P-2001. Researchers can use the database for research publications and presentations without copyright restrictions.
Reference |
Acknowledgments |
This research was achieved by using MAPS Database and RWC Music Database. We thank everyone who has made these databases.