Séminaire n°9


Dissimilarity of Sparsely Encoded Sound Effects and Musical Expectation Models


Intervenant :  Hendrik Purwins, Senior Researcher at the Music Technology Group, Universitat Pompeu Fabra, Barcelona, Spain

Contact : http://www.mtg.upf.es/people/hpurwins

Date : 11/01/10

Abstract : 
A generic sparse sound representation scheme (Annies et al.) is presented that supports classification. We decompose sounds into Gammatone functions mimicking frequency resolution and time course of excitation of the Basilar membrane. We select the time-frequency components with maximal correlation with the sound signal, yielding a point pattern as a scalable sparse representation. We define a dissimilarity of two sounds by introducing a point correspondence that assigns each point of one sound to exactly one point of the other sound and sum the distances of all pairs of points. We use the Potential Support Vector Machine as a kernel-based classifier for the general case of non-metrical dissimilarities. The method outperforms particular sets of low level features and timbre descriptors and performs in the range of averaged MFCC's.
Adaptability helps transferring knowledge to new situations, users, music styles. In a computer-assisted musical performance, human performer creates new ideas (introduce new motifs, rhythms, harmonies, use new playing techniques). The computer companion should identify the novelty and reply directly online, not waiting until the end. We will discuss four approaches with proof-of-concept simple audio examples. Hazan et al. model musical expectation of how drum patterns continue. Musical features such as MFCCs, beats, and onsets are calculated. By k-means clustering, this input is discretized as a sequence of percussive sound categories and of inter-onset intervals. In the bootstrap phase, the number of clusters is estimated based on the Akaike model selection criterion. The clusters are updated online. An N-gram technique then collects statistics of occurrences of drum sequences in order to calculate the belief on how the sequence may continue. Marxer et al. use an incremental clustering process that adds clusters on the fly and uses a Boltzmann machine for the prediction of the continuation of the sequence. Purwins et al. use a polynomial extrapolation to determine surprise based on loudness changes in music.