DAFx samples


Abstract

Our method can synthesize the sounds of musical instruments with considering the pitch-dependency of timbre. While the pitch is manipulated, timbral features are obtained from pitch-dependent feature functions, in which each timbral feature is represented by a function of pitch. We defined three timbral features (1. the relative amplitudes of harmonic components, 2. temporal envelopes, and 3. inharmonic components) based on the spectral shape that corresponds to established auditory differences in timbre. Therefore, given a monophonic sound (called seed), we should analyze the three features. To analyze these features, we used an integrated model consisting of a harmonic model and an inharmonic model. The parameters of the harmonic model represent feature 1 and 2 above, and the inharmonic model directly represents feature 3. The pitch-dependent feature functions for each musical instrument are approximated as a cubic polynomial by using the least squares method. Experimental results for 32 musical instruments using 10-fold cross validation showed that our method that considers pitch-dependency reduced the spectrum distance to 64.70 % and the MFCC distance to 32.31 % in average between the synthesized sounds and the sounds of real musical instruments.

Motivation

Our ultimate goal is to develop an equalizer that can replace the timbre of a specific musical instrument part in a musical piece with users favorite timbre. For example, if a rock-music formation consisting of an electric guitar, electric bass, and drums could be changed to violin, contrabass and timpani, respectively, users could enjoy a classical remix of the music. Moreover, by replacing the guitar sounds of a recorded audio signal with the separated sounds from compact-disc (CD) that records favorite guitar sounds tuned up by famous guitarist such as Yngwie J. Malmsteen, users could appreciate audio signals like ones performed by this famous guitarist. To achieve this, we should tackle the following problems:

Our goal requires separating the polyphonic audio signals into monophonic musical instrument sounds. Itoyama's equalizer [1] solved the former, and there have also been several reports of sound source separation. We tackled the latter.

Functions

Our analysis-and-synthesis method has two functions:

Samples

We prepared some samples of musical instrument sounds, which were synthesized by our method, a baseline method and the phasevocoder. The baseline method is simply a version of our method with no consideration of the pitch-dependency of timbre. The phasevocoder used in this demonstration are implemented as a component of MARSYAS [2]. Moreever, we prepared some real sounds of musical instruments as references.

We prepared some synthesized sounds of which the pitch is manipulated. You can confirm that the larger the manipulated halftones, the larger the differences between the synthesized sounds and the real sounds, except our method. Our method considers the pitch-dependency of timbres.

Instruments Methods The results of pitch manipulation
Manipulated halftones Seeds (440Hz)
-12 -9 -6 -3 ±0 +3 +6 +9 +12
Trumpet Ours
Baseline
Phasevocoder
Real (reference)  
Clarinet Ours
Baseline
Phasevocoder
Real (reference)  

* We did not use the references as learning data to approximate pitch-dependent feature functions in our method.


We prepared some synthesized sounds of which the duration is manipulated. You can confirm that our method does not distort temporal characteristics of musical instrument sounds in attack and decay segment.

Instruments Methods The results of duration manipulation
Manipulated duration ratios Seeds (reference)
1/8 1/4 1/2 3/4 1 3/2 2 3 4
Piano Ours
Baseline
Phasevocoder
Oboe Ours
Baseline
Phasevocoder

If the duration is simply manipulated to be twice as long, the period of the vibrato track is twice as long. We construct a vibrato model to preserve the period of a vibrato track by duration manipulation. We also prepared some sounds synthesized from the vibrato sounds by using the vibrato model. You can confirm that our method preserves the period of a vibrato track.

Instruments Methods The results of duration manipulation
Manipulated duration ratios Seeds (reference)
1 3/2 2 3 4
Electric guitar
(vibrato)
Ours
Baseline
Phasevocoder
Violin
(vibrato)
Ours
Baseline
Phasevocoder

References
Acknowledgments

This research was achieved by using RWC Music Database. And this research was partially supported by the Ministry of Education, Culture, Sports, Science and Technology (MEXT), the Global Century of Excellence Program (GCOE), and the CrestMuse Project of the Japan Science and Technology Agency (JST). We thank everyone who has made this database.


Author : Takehiro Abe (Kyoto University)
mail to abe[at]kuis.kyoto-u.ac.jp

Valid HTML 4.01 Transitional