Akira MAEZAWA (前澤 陽)

Ph.D. Candidate, Graduate School of Informatics at Kyoto University

Also affiliated with the R&D Division at Yamaha Corporation

E-mail:
Twitter: @zawazaw

Research Interests

  1. Application of Bayesian Inference to Musical Audio Synchronization Tasks
  2. + More
    I am interested in synchronizing multiple audio signals, each of which is an acoustic realization to a common underlying symbolic representation. This is an important task for two main reasons.
    1. In Informed Source Separation task, the "information" provided by the user may not be in sync with the target audio signal. So, it is crucial to synchronize the information and the target audio.
    2. Finding a synchronization, phrased differently, is finding how two audio signals differ. By finding how two performers differ in playing a same piece of music, we can create applications that allow the user to, say, find the interpretation that suits his/her taste, or map certain aspects of interpretation from one performance to another.
    Here are a few problems pertaining to synchronization that I have developed.

    Audio Part Mixture Alignment

    Imagine you, an amateur musician, are playing a piece of music written for many instruments. It would be nice to have a play-along track (karaoke track), but all you have is the favorite recording of the piece you are playing. You want to "subtract" the part that you want to play from the favorite recording so that you can have a play-along track that sounds like the favorite recording.

    Audio part mixture alignment is a task designed for such case; it aligns multiple musical audio signals, each of which is a subset to a common music score.

    Say, I am aligning an audio signal of violin+piano duo and an audio signal of the violin part. Typical audio alignment doesn't work quite well in this case, because they more-or-less assume that two audio signal contains exactly the same set of notes. Audio Part Mixture Alignment, on the other hand, looks for notes that are played in common by two or more audio signals, and so is capable of "listening" for the violin solo part, the audio signal that the alignment method should base its alignment on.

    This work was presented in ICASSP2014.

    Audio-Score Alignment

    I have developed an offline score alignment method that jointly infers the timbre, dynamics, tempo trajectory, and note onset timing deviations. This work was published in Acoustic Society of Japan 2011 Spring Convention. It was further developed and is planned to be published in the near future in Computer Music Journal.
  3. Application of Bayesian Inference to Informed Source Separation
  4. Other techniques and Applications
  5. + More
    • Application of Nonparametric Bayes to reverberation suppression
      I have applied nonparametric Bayesian inference for dereverberation. It was published in Acoustic Society of Japan Convention, Spring 2012.
    • Query-by-Conducting
    • Violin Fingering Inference from Audio Signal
      + More

      Violin is played by pressing a finger on one of four strings, and drawing the bow over it. One of the important artistic aspect in violin performance, then, is the fingering -- that is, how one presses the finger on the strings.

      One usually chooses a sequence of fingering so that it is ergonomic, yet artistic. Leopold Mozart eloquently phrases this: Elegance, Necessity, and Convenience -- Leopold Mozart on three criteria of violin fingering Artistic fingering is one that trades off ergonomicity and the artistic choice induced by the timbre unique to each of the four strings ("Air in G" is an example where one chooses to play an entire piece in one string (the G), in order to create a warm timbre).

      My goal is to have the computer "listen" to a violin recording, and find the fingering sequence that "re-creates" the timbral properties of the recording. I assume that the music score to the violin part is known ahead, and is aligned using audio-to-score alignment method. I assume that only the choice of strings have an effect on the audible timbre, and first determine the sequence of strings on which each note is bowed. Then, I find an ergonomic fingering that satisfies the bowed string sequence.

      This task is difficult because I have to "separate" the timbral effect caused by unique acoustic property of the violin body (an Amati and Stradivarius sounds very different), and the bowed string. This is tackled by normaling the timbral features across the entire song. Next difficulty, then, is to find the bowed string sequence. This is attacked by a simple classifier with a few rule-based correction mechanism. Finally, I need to find the fingering given the bowed string sequence. To do this, I design a cost function of fingering across two notes, and minimize the sum of distances for the entire song.

      This method was used to recover the fingering to one of the oldest recordings -- cylinder recording of Joachim (a legendary 19th century violinist)'s Romanze played by Joachim himself. We found that the fingering inferred is probably not the true fingering, presumably the high pitch content, a valuable key for the machine to identify the bowed string, is completely lost. We also found that pitch trajectory sometimes gives valuable cues. For example, Joachim gratuitiously uses the glissando (sliding from one note to another); an amateur violinist like myself would note from the pattern of glissandi the relative difference in bowed string between two notes. Moreover, hints like the fact that open strings cannot be played using vibrato can by a giveaway.

      The transcription of Romanze, along with detailed technical explanation of my method can be found (for free!) in Fall 2012 issue of the Computer Music Journal .

  6. Corporate R&D
  7. + More

    As a corporate researcher, I have developed technology for recognizing chord and beats from an audio signal...


    as well as performing audio-score alignment in realtime (score following technology)

Publication