Infinite Composite Autoregressive Models


Abstract

We present novel probabilistic models that can be used to estimate multiple fundamental frequencies (F0s) from polyphonic audio signals. These models are nonparametric Bayesian extensions of nonnegative matrix factorization (NMF) based on the source-filter paradigm, and in them an amplitude or power spectrogram is decomposed as the product of two kinds of spectral atoms (sources and filters) and time-varying gains of source-filter pairs. In this study we model musical instruments as autoregressive systems that combine two types of sources—periodic signals (comb-shaped densities) and white noise (flat density)—with all-pole filters representing resonance characteristics. One of the main problems with such composite autoregressive models (CARMs) is that the numbers of sources and filters should be given in advance. To solve this problem, we propose nonparametric Bayesian models based on gamma processes and efficient variational and multiplicative learning algorithms. These infinite CARMs (iCARMs) can discover appropriate numbers of sources and filters in a data-driven manner. We report the experimental results of multipitch analysis on the MAPS piano database.

Problems

The standard NMF approximates an amplitude or power spectrogram (nonnegative matrix) as the product of two nonnegative matrices, one of which is a compact set of spectral bases and the other of which is a set of the corresponding time-varying gains. The standard NMF, however, has three fundamental limitations:

  1. The spectral bases are time-invariant, and onlyly their gains vary over time. A large number of independent bases are needed to fully representingent the timbral variations of instrument spectra (e.g., envelopes) even if these spectra share the same fundamental frequency (F0). Such an unconstrained increase of model complexity is not desirable.
  2. A post-processing step for estimating the F0s from individual bases is required because the F0s are not parameterized for representing the spectral bases. If the shapes of spectral bases are unconstrained, the resulting bases often deviate from natural harmonic spectra. This makes F0 estimation difficult and we need to judge the existence of an F0.
  3. The number of bases (model complexity) should be carefully specified in advance because it has a strong impact on the decomposition results. Note that this limitation is closely related to the first. A naive solution is to exhaustively test all possible complexities and find an optimal value, but such model selection is often computationally impractical.

Solutions

We propose infinite composite autoregressive models (iCARMs) by fusing the following techniques into a unified Bayesian framework:

  1. Source-filter factorization: We further factorize the spectral bases as the combinatorial products of sources and all-pole filters. This factorization enables us to represent a wide variety of instrumental sounds in terms of two separate aspects (timbre and F0) with reasonable complexity.
  2. Harmonicity modeling: We represent each source as a comb-shaped function that uses an F0 parameter for representing equally-spanned harmonic partials of the same weight. In addition, we can directly optimize the values of the F0s jointly with decomposition.
  3. Bayesian nonparametrics: We build nonparametric Bayesian models that can automatically adjust the numbers of sources and filters needed to factorize a given spectrogram. We perform sparse learning by introducing infinite-dimensional priors in such a way that only limited numbers of sources and filters are actually activated.

Demos

We present some audio examples of source-filter decomposition for popular music in a completely unsupervised setting (iCARMs have no sensitive hyperparameters that should be tuned carefully). This might be the most challenging task in the field of sound source separation. Note that the volumes of MP3s are maximized independently. In many songs, the most significant filter (filter 0) corresponds to the bass guitar, and the second filter (filter 1) to the vocal part. Percussive sounds and other instrument sounds are incompletely captured by several filters. Although these results are far from satisfactory, we believe that extending iCARMs is a very promising direction of research on sound source separation.

RWC-MDB-P-2001 No.1
Original
Filter 0
(bass guitar)
Filter 1
(vocal)
Filter 2
(lead guitar)
Filter 3
(snare drum)
Filter 4
(vocal)
Filter 5
Filter 6
Filter 7
Filter 8
Filter 9

RWC-MDB-P-2001 No.5
Original
Filter 0
(bass guitar)
Filter 1
(vocal)
Filter 2
(vocal?)
Filter 3
(snare drum)
Filter 4
Filter 5
Filter 6
Filter 7
Filter 8
Filter 9

Those popular songs were sampled from RWC Music Database: RWC-MDB-P-2001. Researchers can use the database for research publications and presentations without copyright restrictions.

Reference
  1. Kazuyoshi Yoshii and Masataka Goto.   Infinite Composite Autoregressive Models for Music Signal Analysis.   13th International Society for Music Information Retrieval Conference (ISMIR), to appear, October 2012.   [PDF] [Slide] [Poster]
Acknowledgments

This research was achieved by using MAPS Database and RWC Music Database. We thank everyone who has made these databases.


Author : Kazuyoshi Yoshii (AIST)
mail to k.yoshii(at)aist.go.jp