Akira MAEZAWA (前澤 陽)
Ph.D. Candidate, Graduate School of Informatics at Kyoto University
Also affiliated with the R&D Division at Yamaha Corporation
Imagine you, an amateur musician, are playing a piece of music written for many instruments. It would be nice to have a play-along track (karaoke track), but all you have is the favorite recording of the piece you are playing. You want to "subtract" the part that you want to play from the favorite recording so that you can have a play-along track that sounds like the favorite recording.
Audio part mixture alignment is a task designed for such case; it aligns multiple musical audio signals, each of which is a subset to a common music score.
Say, I am aligning an audio signal of violin+piano duo and an audio signal of the violin part. Typical audio alignment doesn't work quite well in this case, because they more-or-less assume that two audio signal contains exactly the same set of notes. Audio Part Mixture Alignment, on the other hand, looks for notes that are played in common by two or more audio signals, and so is capable of "listening" for the violin solo part, the audio signal that the alignment method should base its alignment on.
This work was presented in ICASSP2014.
Violin is played by pressing a finger on one of four strings, and drawing the bow over it. One of the important artistic aspect in violin performance, then, is the fingering -- that is, how one presses the finger on the strings.
One usually chooses a sequence of fingering so that it is ergonomic, yet artistic. Leopold Mozart eloquently phrases this:
Elegance, Necessity, and Convenience -- Leopold Mozart on three criteria of violin fingering
Artistic fingering is one that trades off ergonomicity and the artistic choice induced by the timbre
unique to each of the four strings ("Air in G" is an example where one chooses to play an entire piece in one string (the G), in order to create a warm timbre).
My goal is to have the computer "listen" to a violin recording, and find the fingering sequence that "re-creates" the timbral properties of the recording. I assume that the music score to the violin part is known ahead, and is aligned using audio-to-score alignment method. I assume that only the choice of strings have an effect on the audible timbre, and first determine the sequence of strings on which each note is bowed. Then, I find an ergonomic fingering that satisfies the bowed string sequence.
This task is difficult because I have to "separate" the timbral effect caused by unique acoustic property of the violin body (an Amati and Stradivarius sounds very different), and the bowed string. This is tackled by normaling the timbral features across the entire song. Next difficulty, then, is to find the bowed string sequence. This is attacked by a simple classifier with a few rule-based correction mechanism. Finally, I need to find the fingering given the bowed string sequence. To do this, I design a cost function of fingering across two notes, and minimize the sum of distances for the entire song.
This method was used to recover the fingering to one of the oldest recordings -- cylinder recording of Joachim (a legendary 19th century violinist)'s Romanze played by Joachim himself. We found that the fingering inferred is probably not the true fingering, presumably the high pitch content, a valuable key for the machine to identify the bowed string, is completely lost. We also found that pitch trajectory sometimes gives valuable cues. For example, Joachim gratuitiously uses the glissando (sliding from one note to another); an amateur violinist like myself would note from the pattern of glissandi the relative difference in bowed string between two notes. Moreover, hints like the fact that open strings cannot be played using vibrato can by a giveaway.
The transcription of Romanze, along with detailed technical explanation of my method can be found (for free!) in Fall 2012 issue of the Computer Music Journal .
As a corporate researcher, I have developed technology for recognizing chord and beats from an audio signal...
as well as performing audio-score alignment in realtime (score following technology)