Séminaire n°9
Dissimilarity of Sparsely Encoded Sound Effects and Musical Expectation Models
Intervenant :
Hendrik Purwins, Senior Researcher at the Music Technology Group,
Universitat Pompeu Fabra, Barcelona, Spain
Contact :
http://www.mtg.upf.es/people/hpurwins
Date : 11/01/10
Abstract :
A generic sparse sound representation scheme (Annies et al.) is
presented that supports classification. We decompose sounds into
Gammatone functions mimicking frequency resolution and time
course of excitation of the Basilar membrane. We select the
time-frequency components with maximal correlation with the sound
signal, yielding a point pattern as a scalable sparse
representation. We define a dissimilarity of two sounds by
introducing a point correspondence that assigns each point of one
sound to exactly one point of the other sound and sum the
distances of all pairs of points. We use the Potential Support
Vector Machine as a kernel-based classifier for the general case
of non-metrical dissimilarities. The method outperforms
particular sets of low level features and timbre descriptors and
performs in the range of averaged MFCC's.
Adaptability helps transferring knowledge to new situations,
users, music styles. In a computer-assisted musical performance,
human performer creates new ideas (introduce new motifs, rhythms,
harmonies, use new playing techniques). The computer companion
should identify the novelty and reply directly online, not
waiting until the end. We will discuss four approaches with
proof-of-concept simple audio examples. Hazan et al. model
musical expectation of how drum patterns continue. Musical
features such as MFCCs, beats, and onsets are calculated. By
k-means clustering, this input is discretized as a sequence of
percussive sound categories and of inter-onset intervals. In the
bootstrap phase, the number of clusters is estimated based on the
Akaike model selection criterion. The clusters are updated
online. An N-gram technique then collects statistics of
occurrences of drum sequences in order to calculate the belief on
how the sequence may continue. Marxer et al. use an incremental
clustering process that adds clusters on the fly and uses a
Boltzmann machine for the prediction of the continuation of the
sequence. Purwins et al. use a polynomial extrapolation to
determine surprise based on loudness changes in music.