Barcelona 2006

Summer School in Sound and Music Computing

Universitat Pompeu Fabra
Barcelona, Spain
July 24-28, 2006

This Summer School is organized by the S2S² project and the Music Technology Group of the Universitat Pompeu Fabra in Barcelona, with the goal to promote interdisciplinary education and research in the field of Sound and Music Computing. The School is aimed at graduate students working on their Master or PhD thesis, but it is open to any person carrying out research in this field.

This is the second Summer School organized by S2S², last year it took place in Genova.


Invited experts



Registration fee


Social events




  • Roberto Bresin ( Royal Institute of Technology, Stockholm)
  • Nicola Bernardini (Conservatory of Padova)
  • Antonio Camurri (University of Genova)
  • Alain De Cheveigné (Ecole Normale Supérieure, Paris)
  • Henkjan Honing (University of Amsterdam)
  • Marc Leman (University of Ghent)
  • Xavier Serra (Pompeu Fabra University, Barcelona)
  • Giovanni De Poli (University of Padova)
  • Davide Rocchesso (University of Verona)
  • Vesa Valimaki (Helsinki University of Technology)
  • Bill Verplank (Stanford University)
  • Gerhard Widmer (Johannes Kepler University Linz)

Invited Experts

  • Jyri Huopaniemi (Nokia Research Center, Helsinki)

  • Leigh Landy (Music, Technology and Innovation Research Centre, De Montfort University, Leicester)

  • Fabien Levy (Columbia University, New York)

  • Pierre Louis Xech (Microsoft Research, Cambridge)

Academic Program

  • 6 hours of lectures by Bill Verplank on Interface Design and 6 hours of lectures by Henkjan Honing on Music Cognition.
  • 9 hours of presentations by the participating students and discussions on their research work.
  • 20 hours of presentations and discussions related to the S2S² Sound and Music Computing Roadmap.

The lectures are designed to be of interest to any graduate student or researcher in the field of Sound and Music Computing. The topics chosen for this year are Interface Design and Music Cognition; relevant topics in our research fields which have particular methodologies and research strategies. The lectures will present these particular methodologies and their application in Music related problems.

All the participating students will give short presentations on their current research. The emphasis will be given to methodological and context issues. Thus each presentation should emphasize the methodological approach chosen and the scientific, technological and industrial context of the research. The discussions will give feed back to the students that should be useful for the continuation of their research.

The main topic of the summer school will be the Roadmap on Sound and Music Computing that is being written as part of the S2S² project. There will be special lectures by invited experts and discussions on two major parts of the Roadmap, the industrial and the cultural contexts of the field. In particular the focus will be given to the academic research and both its relationship with the industrial exploitation and its use in contemporary music production. The resulting discussions will contribute to the roadmap.

Program schedule:

Monday 24th
Tuesday 25th
Wednesday 26th
Thursday 27th
Friday 28th
Music Cognition
Henkjan Honing

Interface Design
Bill Verplank

Music Cognition
Henkjan Honing

Industrial context for Sound and Music Computing

Moderator: Xavier Serra

Coffee break


Interface Design
Bill Verplank

Music Cognition
Henkjan Honing

Interface Design
Bill Verplank

Scientific context of research
Moderator: Alain de Cheveigné

Presentation by students

Social context of research
Moderator: Nicola Bernardini

Presentation by students

Industrial context of research
Moderator: Vesa Valimaki

Presentation by students

Critical evaluation and discussion
about the summer school
Moderator: Roberto Bresin
Coffee break





Workshop: Towards a shared and modular curriculum on SMC
Moderator: Giovanni de Poli

Visit to the Music Technology Group, Universitat Pompeu Fabra


PhD defense







Student presentations (15 minutes each):

Scientific context: Monday 24th of July

  • "Evolving populations of computational models and applications to expressive music performance" - Amaury Hazan
    In the context of expressive performance modeling, we aim to induce expressive performance models using a performance database which was extracted from a set of acoustical recordings. We propose a new approach called Evolutionary Population of Generative Models (EPGM) based on Evolutionary Computation (EC). We present a first instantiation of EPGM based on Strongly Typed Genetic Programming (STGP), in which the evolved programs are constrained to have the structure of Regression Trees. We show this approach is more flexible than well-established machine learning approaches because (i) it evolves a population of models which may produce different predictions, (ii) it enables the use of custom data types at different levels (primitives inputs and outputs, prediction type), (iii) it enables the use of elaborate and possibly domain-specific accuracy measurements. We illustrate this latter point by presenting a fitness function based on melodic similarity which was fit to human judgement based on a listening experiment. We finally show this approach can be applied to high level transformations (e.g. mood) and present some future EPGM extensions.
  • "Depth perception" - Delphine Devallez
    The present research work deals with depth perception, and how to render sound sources spatially separated in distance and give a sense of perspective. Over the past decades, the majority of research on spatial sound reproduction has concentrated on directional localization, resulting in increasingly sophisticated virtual audio display systems capable of providing very accurate information about the direction of a virtual sound source. However it is clear that full 3-dimensional rendering also requires an understanding of how to reproduce sound source distance. Since a few years a couple of researchers in psychology, neuroscience and computer engineering have shown interest for this third dimension, that could further enlarge the bandwidth of interaction in multimodal display and provide newly designed interfaces. Moreover since display technology is already able to produce visual depth, it seems natural to enrich the sounds of objects and events with information about their relative distance to the user. From a technological point of view, the auditory-visual interactions resulting from this multimodal presentation of information should then be taken into account and further scientifically investigated , since they are still poorly understood in particular with regard to depth perception.
  • "Gesture based instrument synthesis" - Alfonso Pérez
    Synthesis of traditional music instruments has been an active research area and there exist successful implementations of instruments with low degree of control like non-sustained excitation instruments. But for instruments with sustained excitation, such as bowed strings or wind instruments, where the interaction between instrument and performer is continuous, the quality of existing models is far from realistic. In general, musical instrument synthesis techniques try to model the instrument but forget about the interaction between performer and instrument, that is musically much more relevant than the instrument itself. This interaction covers expressivity, the intentional nuances and gestures that make the performer, but also what we call naturalness, that is, non intentional gestures made by the performer due to the physical constraints of the instrument, the playing technique, etc. These non-intentional gestures give a specific flavor to the sound of the performance, that make it sound natural and realistic. We can roughly classify the existing synthesis techniques into two categories: Physical models that focus on physical phenomena of sound production, and spectral models that focus on sound perception. With physical models naturalness and expressivity can hardly be reached without the need of controlling a huge amount of parameters, that require the instrument itself, as well as a mastery comparable with the traditional performer and spectral models lack of performer interaction and articulation, that is, gestures. The aim of this work is to try to improve the quality in instrument sound synthesis, specifically for the violin family. We propose a hybrid model between spectral and physical models to take advantage of the characteristics of both approaches, focusing on the gestures of the performer with the objective to provide naturalness in the synthesis.
  • "Expressive gesture and music: analysis of emotional behavior in music performances" - Ginevra Castellano
    I present some examples of analysis of music performances aiming at investigating the role of expressive gesture in music, with a special focus on recognition of emotions. I performed an experiment in which two musicians, a pianist and a cello player, played an excerpt from the Sonate no 4 op 102/1 for piano and cello from L. van Beethoven in different emotional conditions. I show how to extract expressive movement features from music performance and preliminary results from the analysis of such data. The experiment has been carried out in collaboration with GERG. Feature extraction is performed in real-time by the new EyesWeb 4 open platform (available at
  • "Mapping from perception to sound generation" - Sylvain Legroux
  • "The role of audiofeedback to improve motor performance of subjects" - Giovanna Varni

Social & Cultural context: Tuesday 25th of July

  • "Musical interfaces accessible to novices" - James Mc Dermott
    Our research focusses on one sub-task of musical composition, that of setting synthesizer parameters. We use interactive Evolutionary Computation (iEC) to aid inexperienced users in controlling synthesizers: it allows an iterative design process in which the user's main task is judging results, rather than constructing solutions. We discuss potential advances in iEC, including a new interface component and a method of supplementing it with non-interactive EC. We also present results on non-interactive EC performance. We discuss the possibilities of applying the same approach to other sub-tasks of composition; and finally we imagine the implications of using the iEC approach to remove the constraints of skill and prior knowledge from the composition process, so that it becomes purely a matter of taste.
  • "Voice analysis for singing education" - Oscar Mayor
    The current research in tools for singing education consists mainly in real-time tools with visual feedback giving information about tuning and tempo of the singing performance and voice quality characteristics, referring to timbre and formants of the singer’s voice. These tools mainly use real-time visualization of pitch curve against time and short-term spectrum or spectrogram giving instantaneous visual feedback to the performer. In this talk a system for evaluating singing performances is presented where the singing performance is analyzed using a MIDI score as reference and a visual expressive transcription of the performance is given as a result. The expression transcription consist on the notes in the MIDI score aligned to the user performance and each note segmented into sub-regions (attack, sustain, release, transition, vibrato). Each region is labeled with the kind of expression detected by the system following a set of heuristic rules based on analysis descriptors. The expression labels assigned to each sub-region are based in a previous expression categorization done manually from a large set of singing performance executions in order to distinguish between common resources used by singers in pop-rock music. Some analysis descriptors can be also visualized simultaneously by the performer to have a rich visual feedback of the performance.
  • "Visual feedback in learning to perform music" - Alex Brandmeyer
    The use of visual feedback to aid musicians in improving their performances has recently been researched using different visual representations and analysis techniques. We recently conducted experiment in which percussion students imitated different patterns recorded by a teacher with and without the use of visual feedback. In the experiment we used a real drum kit with contact microphones attached to record data about the timing and dynamics of the performances. We provided 2 different forms of visual feedback as well as a control condition with no visual feedback to test the effects of visual feedback and the type of visual representation on performance accuracy. The first form of feedback, analytic, utilized a scrolling display similar to a musical score, while the second, holistic, presented a changing shape drawn using probabilities generated by a real time statistical analysis of the incoming notes. Qualitative feedback from the subjects indicated that the visual feedback was found to be useful. We are currently doing further analysis of the data collected to see if the visual feedback improved performance, and if so, in what ways.
  • "The rigid boundaries of musical genres" - Enric Guaus
    One of the most active areas in Music Information Retrieval is that of building automatic genre classification systems. Most of their systems can achieve good results (80% of correct decisions) when the number of genres to be classified is small (i.e. less than 10). They usually rely on timbre and rhythmic features that do not cover the whole range ofmusical facets, nor the whole range of conceptual abstractness that seem to be used when humans perform this task. The aim of our work is to improve our knowledge about the importance of different musical facets and features on genre decisions. We present a series of listening experiments where audio has been altered in order to preserve some properties of music (rhythm, timbre, harmonics…) but at the same time degrading other ones. The pilot experiment we report here used 42 excerpts of modified audio (representing 9 musical genres). Listeners, who had different musical background, had to identify the genre of each one of the excerpts.
  • "Programming for the Masses - Computer Music Systems as Visual Programming Languages" - Guenter Geiger
  • "Intonation and expression: a study and model of choral intonation practices" - Johanna Devaney
    The modeling of choral intonation practices, much like those of non-fretted string ensembles, presents a unique challenge because at any given point in a piece a choir’s tuning cannot be consistently related to a single reference point; rather a combination of horizontal and vertical musical factors form the reference point for the tuning. The proposed methodology addresses the conflict through a combination of theoretical and technological approaches. In the theoretical approach, the vertical tendencies are addressed in relation to the harmonic series and theories of sensory consonance, while the horizontal tendencies are examined in terms of recent theories of tonal tension and attraction. The technological, or computational, approach uses statistical machine learning techniques to build a model of choral intonation practices from the microtonal pitch variations between recorded choral performances. The observed horizontal intonation practices may then be examined as expressive phenomena by taking the horizontal tendencies inferred from the tension models as a norm, and then viewing musically appropriate deviations from this norm as expressive phenomena. Thus horizontal intonation practices may be related to not only to musical expectation but also musical meaning or emotion, as it relates to performance.
  • "Object Design for Tangible Musical Interfaces"- Martin Kaltenbrunner
    This research focuses on the design of passive tactile features for tangible user interface components and their relation to arbitrarily assigned acoustic descriptions. Tactile dimensions such as surface structure, temperature, weight, the global shape and size allow the classification of passive tangibles into generic object classes and specific object instances. Within the context of the reacTable, a modular electro-acoustic synthesizer with a tangible user interface, these tactile features can be used to encode the various synthesizer components in the haptic domain allowing the easy object identification with a simple grasp or hand enclosure. The acoustic properties of the synthesizer components will be defined with adjectives describing the perceptive quality of the resulting sound. The current design of the reacTable tangibles defines a series of acrylic objects in different geometric shapes with attached colour or symbol codes, which proved to be problematic in a dark concert environment as well as for sight-disabled users. A user study shall clarify if the assigned object descriptions and the chosen hypothetical mappings between the tactile perception and sonic behaviour of a chosen synthesis component are valid and should eventually lead to an improved design of the tangibles for the instrument.

Industrial context: Wednesday 26th of July

  • "Free software and music computing" - Pau Arumí
  • "Toys and video games" - John Arroyo
  • "Scratching and DJs" - Kjetil Falkenberg Hansen
  • "Leisure and voice control" - Jordi Janer
    The role of Sound and Music Computing in the industry has evolved over the last decades in the three typical targets: studio equipment, musical instruments and home entertainment. While studio equipment and musical instruments have already massively incorporated SMC technologies, home entertainment systems will be presumably our main target for the next years. Is in this context that we can use the term "leisure", which can be applied to a convergence of home media centers and game consoles.

    This presentation addresses voice control as a way to transmit musical information to a musical system. The main application of voice control is instrument synthesizers, useful for instance in karaoke devices. Nevertheless, the research outcome can be also applied to control conducting or visualization systems. This research consists of two parts: voice gesture description and definition of adequate mapping strategies. Studying instrument imitation, we can define a voice gesture as a sequence of consonant-vowel phonemes. Phonetic segmentation and classification in broad phonetic classes are being developed. In addition, slow-varying perceptual envelopes are added to the voice gesture. Summarizing, a voice gesture is described by context descriptors and continuous envelopes. Mapping these voice gestures to the instrument control will depend on the instrument and the technique employed. Here, instead of constraining voice description to MIDI messages, we propose to do a more adequate mapping for signal-driven synthesis that can be either knowledge-based or based on machine learning. The talk will conclude looking at current commercial systems and potential use-cases of voice control in a leisure context.
  • "Music recommendation systems" - Marco Tiemann & Oscar Celma
  • "Audio melody extraction: the importance of high level features in music information retrieval applications" - Karin Dressler
Workshop: Social and cultural context for Sound and Music Computing
  • Morning:
    • S2S2 presentation- Nicola Bernardini
    • "Social context for Sound and Music Computing"- Marc Leman
    • "Is a Science without Conscience a support for music?" - Fabien Levy
      I will first show how both composition and science are related with the more general problem of their representations, the first playing with signs to build new music-worlds (cf. the couple graphemology/grammatology in semiotic), and the latter being deeply united with its representations (cf. Derrida). Without a high conscience of the episteme implied by those representations, composing and doing musical sciences are "but the ruin of the soul", to parody Rabelais. To exemplify my position, I will then try to "deconstruct" different scientific models working on the controversial notion of "musical consonance" (historical musicology, acoustics and psychoacoustics).
    • "Investigating a Sound-based Paradigm and its Social Implications" - Leigh Landy
  • Afternoon:
    • Panel: Social and Cultural Context for Sound and Music Computing: Does technology drive music or viceversa?
      Nicola Bernardini, Marc Leman, Fabien Levy, Leigh Landy, Davide Rocchesso, Roberto Bresin
Workshop: Industrial context for Sound and Music Computing
  • "Initial ideas for the Industrial Context of the S2S2 roadmap" - Xavier Serra
  • "Sound, Music and Mobility - Key Challenges for Future Research" - Dr. Jyri Huopaniemi, Head of Strategic Research, Nokia Research Center
    In this presentation, I will give an overview of relevant research challenges for sound and music in future mobile devices. The background and history of mobile computing will be explained, and the presentation is augmented by current research examples. Key issues in technology, user experience and business outlook will be covered. Finally, recommendations for concentration areas in future research of sound and music will be given.
  • Panel: Industrial Context for Sound and Music Computing: Is technology transfer working?
    Xavier Serra, Pierre-Louis Xech, Jyri Huopaniemi, Vesa Valimaki, Antonio Camurri, Alain de Cheveigné


A maximum of 20 students will be admitted to the school. The candidates will be evaluated by the teachers and the application should include the following documents in pdf format:

  • Curriculum vitae (max. 1 page)
  • Certified copy of academic degree
  • Summary of the research proposal (max. 2 pages)

Students have to send their applications to Xavier Serra before May 1st. Notification of acceptance will be given no later than May 15th.

For people not wishing to make research presentations during the school, a brief curriculum vitae is sufficient and the deadline for application is June 30th.

These people should also send their applications to Xavier Serra or Emilia Gómez.

Registration Fee

The regular registration fee is 300 €. This fee also covers the costs for lunch and various evening social events.

The registration fee for students is 200 €. This fee also covers the costs for lunch and various evening social events.

There will be a few student scholarships that will cover the registration fee.

The deadline for registration is June 30th.

Traveling and Accommodation

Participants will have to arrange their own travel and accommodation. University dorms are available at a special rate. For additional information contact Cristina Garrido.

Social events

  • Banquet, Tuesday 25th: El Chiringuito de Escribá

  • Concerts at Metronom:

    • 25th of July at 21:00 "Deriva del Cristal Sonoro" (IUA-Phonos grant): by Carmen Platero and Cristián Sotomayor
      Installation - Performance

    • 26th of July at 21:00 ReacTable and Ensamble Crumble

    • 27th of July at 21:00 Concert around Harry Sparnaay, supervisor: Harry Sparnaay and performed by Harry Sparnaay students at ESMUC:
      Irene Ferrer Feliu, flute
      Alejandro Castillo Vega, clarinet
      Victor de la Rosa, Daniel Arias Romeo, Gerard Sibila Roma, bass clarinet

  • Search for other events in Barcelona during the summer school


  • School: the school sessions will take place in the França Building of Pompeu Fabra University (at the Auditorium), as well as coffee breaks.
    Passeig de Circumval·lació, 8. 08003 Barcelona (map)

  • Lunches: Navia restaurant, in front of the França building.
    Comerç 33. 08003 Barcelona (map)

  • Banquet: El Chiringuito de Escribá.
    Bogatell beach. (map)

  • Concerts: Metronom.
    C. Fusina 9 - 08003 Barcelona (map)