SMC Roadmap

Version 1.0 of the Roadmap: The S2S2 Consortium (2007). A Roadmap for Sound and Music Computing. Version 1.0. ISBN: 978-9-08-118961-3. Click Here to Download

The first version of this Roadmap was the result of the S2S2 Coordination Action (project funded by the European Commission during the 6th Framework Programme in the Future and Emergent Technologies).The current text is an outgrowth of that version.

Editors of current version: Nicola Bernardini, Xavier Serra, Marc Leman, Gerhard Widmer, Giovanni De Poli.

Main authors: Nicola Bernardini, Roberto Bresin, Antonio Camurri, Giovanni De Poli, Marc Leman, Davide Rocchesso, Xavier Serra, Vesa Välimäki, Gerhard Widmer.

Main contributors: Federico Avanzini, Emmanuel Bigand, Damien Cirotteau, Alain De Cheveigné, Cumhur Erkut, Werner Göbl, Fabien Gouyon, Philippe Lalitte, Henri Penttinen, Pietro Polotti, Daniel Pressnitzer, Rainer Typke, Gualtiero Volpe.

The current version is now open for comments from the whole SMC community.

Table of Contents

Future Scenarios

The three scenarios presented here represent visions of life after a few attainable (though not necessarily easy) scientific/technological targets have been hit through the removal of roadblocks, the filling-in of gaps and the meeting of certain challenges as outlined in another section of the Roadmap.

The scenarios describe general environments and activities concerning our everyday life soundscapes, the professional perspective of musicians and general music appreciation. As such, there is no one-to-one correspondence between a particular scenario and any of the particular key issues identified later on in the Roadmap. Rather, they provide hints of how the world could be if and when Sound and Music Computing research achieves the multidisciplinarity and transversality proposed in this Roadmap.


Controllable Sound Environments

Sensors, actuators, microprocessors, and wireless connection facilities are increasingly being embedded into everyday objects. These can be augmented with sonic features that make the environment more enjoyable, social life more interesting, and personal life more relaxed and healthy.

Grandma and me

Hi, I'm a teenager - 15 years old - and I like to wake up late, at least on Sundays.But today I decided to spend a few hours with my grandmother, so I set my alarm clock bed for 9 a.m. First, the bed tries to wake me up gently, by vibrating and making purring noises. Even though I am not sleeping anymore, I like to wait until the bed gets more nervous, when it realises I am still lying on it, and starts making harsh rhythmic movements and squeaking. I love it!

I get up and go into the living room. It's a mess after last night's party. You know, mum and dad are on vacation, so ... Chairs are everywhere; chips and peanuts all over the floor. I play some of my favourite hip-hop music while I put things back in place and prepare for the day. Different MCs and DJs are embodied in different pieces of furniture. I know it sounds retro, but it is so cool to move DJ Grandmaster Flash while I am moving that old armchair around. While I move through the house, the music seamlessly follows me through the objects I pass. When I leave home, I put on my headset and keep riding the beat. The headset doesn't cut me off from the environment,though. When I catch sight of a strange bird singing, I look at it and put my hand to my ear. This gesture activates acoustic zooming, and I can appreciate the bird song in isolation. But then I am distracted by the sight of a friend of mine chatting to a girl. Instinctively, I steer my acoustic zoom towards them, but I realize she is wearing one of those new active jackets that can create an acoustic shield around you. I wonder what they are talking about.

I reach my grandma's house. She became almost deaf about five years ago, but she is really brave and decided to get an implant of powerful bionic ears. In recent years, she has become more and more worried about the bad things that could happen to her. That's why my dad bought her a new door. As we leave the house and she closes the door, she proudly explains that the complex sound of the lock tells her that the lights are switched off, the gas is turned off, the fish are fed, and the window in the living room has been left wide open. I'm sure she left it open on purpose, but we go back in and close it anyway.

We are going to the Fred Astaire club today. Grandma is wearing brand new Mike shoes. She feels much more confident about herself in these shoes, because they give her bionic ears some sonic feedback about equilibrium so she's not afraid of falling while she moves around. Before we go into the club she wants to check on her health, which is promptly sonified through her bionic ears. Everything sounds fine, so we go in just as the show's starting. There are a dozen over-eighties there, wearing Mike shoes, tap dancing, and clearly having a good time. Their subtle, gentle movements trigger a massive and diverse set of rhythms. Who knows - it may even be cooler than hip hop!

On the way back home, grandma tells me how different the town was when she was a child. There was even a working water mill. Fortunately, we can both enjoy its sound. Both my cheap headset and her bionic ear can induce selective silence and let the lost sounds of the town emerge from history. That makes her remember even more. It's fun being with grandma.

Musical Instruments for All

In 2020, many sound devices will have a general purpose computer in them and will include quite a number of real-time interaction capabilities, sensors and wireless communication. Basically, any sound-producing device will be able to behave like a personalised musical instrument. Music making will become pervasive and many new forms of communication with sound and music will become available. Music content will be inherently multimodal and music making available to everyone: music from all to all.

How I became a professional musician

A year ago I bought the new wearable mobile device from SMC Inc. With it I was able to listen to my favourite songs and interact with them in ways not possible with the previous generation of devices. Now I was able to change many aspects of the songs by gestural and vocal control. Some of my friends were really good at it, and I started to improve my skills by practising on my home multimedia system. This system includes Jeeves, a virtual musical assistant, which observes and analyses my body movement, my singing and my musical abilities in general. Jeeves teaches me how to express myself in the style of famous musicians, from The Beatles to cellist Yo-Yo Ma. After a period of training, I was ready to play and jam with other users over the Internet and get advice from more expert users or their virtual musical assistants. After a couple of months, I started to get a good reputation in the community. One day, a group of users asked me to join them in person at a discotheque. In that discotheque we were able to use our mobiles to plug into a music role in the overall show. People took all kinds of roles; some were projected as visual characters on the surrounding space and walls, some were projected into moving lighting, others, like me, were controlling the expressivity of the music being played. Since I was a beginner, I took a simple role as the controller of the timbral aspects of the drum. Another person took control of the drum sticks. We had to dance to coordinate the rhythm in this shared drum set with the other visual and musical roles. This experience felt much more physical and exciting than being at home with my multimedia system. In the discotheque I had the feeling of being part of a community and of real teamwork. The various haptic devices in my clothes heightened my aural and visual perception and interaction with friends.

I met these new friends many times; I practised a lot in discotheques and at home. I developed my own style and developed good skills in controlling virtual instruments, with Jeeves evolving with me and my community. I could control expression and lead my friends in improvisations and jam sessions. One day Jeeves asked me: "Could I please change my name to Madonna?; I feel that my background knowledge has changed". I realised that was true and changed her name. Today I am a professional musician: an MJ (Music Jockey). Madonna and I prepare the shows on the authoring system at home. I design the structure, framework, roles, and musical material to be used. In my shows, I sometimes improvise with acoustic instrument players. I collect data on my sound device by monitoring their movements and performance choices. I can also monitor and analyse all the events in the show and the behaviour of everyone involved in it, including the audience. My virtual musical assistant Madonna wants to change her name again. Now she would like to be called Karajan.

Personalized Musical Devices

Current portable mp3 players -despite their simplicity - have already radi- cally changed human music-listening behaviour. Now, the first web-based music information systems which provide contextual information about music simply by connecting already existing services (such as Wikipedia, CDDB, lastFM,etc.), without the utilisation of any musical expertise, are beginning to emerge. Based on current trends in SMC research, we predict that such systems are likely to further develop in the direction of multi-modal, interactive, open and adaptive systems that support both beginners and experts from different cultures in ac- cessing music and music-related information.

My new music friend

I take my expert music companion with me anywhere, anytime, because I love music. The companion doesn't just play music. It gives me a lot of other information about the music - from `practical' things such as transcriptions of instruments and harmonies, to animated visualisations of the structure of the music, contextual information such as style, historical and cultural relations, and the relationship of the piece to other, related pieces and styles.

My device is easy to use. I can talk to it, or I can shake it to show it the kinds of rhythms I like. It is aware of the music being played on radio stations and available in music databases world-wide, and it finds new music that I like in a particular situation. I can point it at music being played by a street band, and it will tell me what it is. It understands my intentions and learns my musical preferences. Sometimes it will surprise me, teaching me something new about music and my taste. And by the way, having had nano-sized loudspeakers (painlessly) implanted in my ears, I listen to my music without bothering with bulky headphones and earplugs.

My music companion also helps me out in social contexts. When I am desperately looking for a date, my companion alerts me there's a dance party around the corner for people with a similar interest in Brazilian music. When I get to it, my companion contacts the DJ system and sends it some of my favourite pieces (rare Brazilian stuff). The girl in the corner just goes "Wow".

My music companion is no longer an isolating device that runs playlists; it's a friend that enhances my musical abilities, reflects my personality and helps me to socialise.

Definition of the field

We propose the following definition for SMC:

Sound and Music Computing (SMC) research approaches the whole sound and music communication chain from a multidisciplinary point of view. By combining scientific, technological and artistic methodologies it aims at understanding, modelling and generating sound and music through computational approaches.

The central focus of the research field is sound and music. Sound is the resonance of objects and materials that we can listen to. Music is the intended organisation of sounds for particular uses in social and cultural contexts. The sound and music communication chain covers all aspects of the relationship between sonic energy and meaningful information, both from sound to sense (as in musical content extraction or perception), and from sense to sound (as in music composition or sound synthesis). This definition is generally considered to include all types of sounds and human communication processes except speech. Speech research has its own aims and methodologies and is outside the SMC field.

The other elements contained in the definition statement above can be briefly explained as follows:

The multidisciplinary point of view relates to the use of various research methodologies and disciplines from the natural and human sciences. SMC also includes various research goals and approaches that deal with cross-modality, such as the relationship between perception and action and the integration of different senses involved in human-machine interaction (hearing, vision, movement, haptics, etc.), both in individual and social contexts.

Scientific and technological methodologies refer to empirically-based and modelling-based approaches that draw upon advanced tools for measuring and processing information. Artistic methodologies refer to approaches that explore human experience and expression.

Understanding refers to our knowledge of the mechanisms that underlie how people deal with sound and music in terms of its content and experience.

Modelling refers to the representation of knowledge through algorithms and tools.The resulting models are used both in applications that aim at scientific understanding (e.g. simulation of perceptual processes) and also in applications that aim at practical understanding (e.g. in sound-aware objects, music information retrieval systems, music production companions).

Production refers to the creative use of algorithms and tools to develop new content in which sound and music are communicated, as in sound environments, interactive artistic works and sonic design.

Computational approaches refer to the core processing which allows the development of tools linking sonic energy with subjective experience. Computing is the shared practice that connects scientific understanding, the development of technological equipment and content-based creation.

Disciplines involved

The disciplines involved in SMC cover both human and natural sciences. Its core academic subjects relate to music (composition, performance, musicology), science and technology (physics, mathematics, engineering) and psychology (including psychoacoustics, experimental psychology and neurosciences).


As the name itself indicates, Music provides the core investigation area for SMC. It supplies both an endless material (both in scope and depth) for analytical investigation (Musicology) and the requirements to extend expressive means of creation (Music Composition), while Music Performance yields interest on both sides (analytical and expressive).

Music Composition: This concerns all research that has a focus on musical content creation and sound design. It includes creating music as a score, as an interactive artistic event, as a sound installation, as a soundtrack, and as any form of organised sound event which communicates information.

Music Performance: Music performance is a complex activity involving physical, acoustic, physiological, psychological, social and artistic issues. At the same time, it is also a deeply human activity, relating to emotional as well as cognitive and artistic categories. The research in this field can be seen as ranging from studies aimed at understanding expressive performance to attempts at modelling aspects of performance in a formal, quantitative and predictive way.

Musicology: All research that deals with musical meaning formation, musical content description, and associated mediation technologies in particular sociocultural contexts. It comprises the study of how musical content can be described and how the subjective and sociocultural background of users plays a role in the production, distribution and consumption of music.

Science and Technology

Due to the interdisciplinary and problem oriented character of SMC, many disciplines in the scientific and technological field are needed. Physics and mathematics deal with the physical vs. abstract side of SMC, while engineering disciplines face the technological needs to build systems and develop applications.

Physics: Acoustics is the science concerned with the production, control, transmission, reception and effects of sound as a physical phenomenon. The branch of acoustics of special interest to the SMC community is music acoustics. It includes the acoustics of musical signals (such as in expressive music performance), musical instruments and singing voices.

Mathematics: This area is concerned with the mathematical approaches to musical structures and processes, including mathematical investigations into music–theoretic or compositional issues as well as mathematically motivated analyses of musical works or performances.

Engineering: This includes all the research in computer science and engineering, signal processing and electronics that deals with sound and music information representation, processing and communication. It comprises multimedia information systems, artificial intelligence, audio signal processing, robotics, sensors and interface technology.

Psychology: This includes all research into music–related behaviour and brain processes, including the roles of perception, cognition, emotion and motor activities. Psychology is here understood to cover the whole domain from psychoacoustics to experimental psychology to neurosciences.

Areas of Application

SMC research can also be finalised towards applications that can acquire an important role in the definition of the field. Current areas of application include digital music instruments, music production, music information retrieval, digital music libraries, interactive multimedia systems, auditory interfaces and augmented action and perception (e.g. games, interactive artifacts, and appliances).

Digital music instruments: This application focuses on musical sound generation and processing devices. It encompasses simulation of traditional instruments, transformation of sound in recording studios or at live performances and musical interfaces for augmented or collaborative instruments.

Music production: This application domain focuses on technologies and tools for music composition. Applications range from music modelling and generation to tools for music post–production and audio editing.

Music information retrieval: This application domain focuses on retrieval technologies for music (both audio and symbolic data). Applications range from music audio–identification and broadcast monitoring to higher–level semantic descriptions and all associated tools for search and retrieval.

Digital music libraries: This application places particular emphasis on preservation, conservation and archiving and the integration of musical audio content and meta–data descriptions, with a focus on flexible access. Applications range from large distributed libraries to mobile access platforms.

Interactive multimedia systems: These are for use in everyday appliances and in artistic and entertainment applications. They aim to facilitate music–related human–machine interaction involving various modalities of action and perception (e.g. auditory, visual, olfactory, tactile, haptic, and all kinds of body movements) which can be captured through the use of audio/visual, kinematic and bioparametric (skin conduction, temperature) devices.

Auditory interfaces: These include all applications where non–verbal sound is employed in the communication channel between the user and the computing device. Auditory displays are used in applications and objects that require monitoring of some type of information. Sonification is used as a method for data display in a wide range of application domains where auditory inspection, analysis and summarisation can be more efficient than traditional visual display.

Augmented action and perception: This refers to tools that increase the normal action and perception capabilities of humans. The system adds virtual information to a user’s sensory perception by merging real images, sounds, and haptic sensation with virtual ones. This has the effect of augmenting the user’s sense of presence, and of making possible a symbiosis between her view of the world and the computer interface. Possible applications are in the medical domain, manufacturing and repair, entertainment, annotation and visualisation, and robot teleoperation.


[Camurri et al., 1995] Camurri, A., De Poli, G., and Rocchesso, D. (1995). A taxonomy for Sound and Music Computing. Computer Music Journal, 19(2):4–5.

[Coulter (ed.), 1998] Coulter (ed.), N. (1998). ACM computing classification system (1998).


This part addresses the various contexts that determine how the research field of sound and music computing is embedded in a societal framework. Four of these may be identified, namely, the research context, the educational context, the industrial context and the socio-cultural context. The research context is about the state and trends of related scientific and technological developments and their influence on sound and music computing. The higher education context is about the education of future researchers in the field. The industrial context is about the impact on the industries and about the relevant trends in the information and communication technology (ICT) sector. Finally, the socio-cultural context is about the link to culture and the relevant social implications. These four contexts thus provide the background in which the state-of-the-art and challenges of sound and music research are situated.



This section aims to identify the major research trends within which sound and music computing is to be situated. The focus is on trends in ICT, the cognitive sciences and the humanities. Given the broad scope of sound and music computing, this section devotes special attention to the rise and importance of a multidisciplinary research space that is the motor for innovation in society. References to this general research space can be found in several reports from the European Commission and the National Science Foundation of the US (European Commission – New Instruments, 2004, European Commisison – Research Area, 2007, National Science Foundation, 2004). Below, we aim at identifying the specific research trends that are deemed to be relevant to sound and music computing. Each trend will be summarized by a short statement that aims at identifying the core issue that is relevant to future challenges of the sound and music computing research field.

Research Trend 1: Rapid progress in ICT

In recent decades, progress in sound and music computing has been driven by revolutionary developments in information technology. The transitions from analogue to digital data processing and from wired to wireless mobile data-communication have been key components in this development. The relentless rate of annual to biennial doubling of storage capacity, bandwidth and data-crunching computing power has been unprecedented in history, leading to fundamental transformation of all aspects of the production-processing-distribution-consumption chain of sound and music content. In no other field has an entire processing chain been digitalised and made available on broadband networks and mobile devices on such a massive scale. In this development, technological progress has had a direct empowering influence both on scientific knowledge and on end applications, which in turn have impacted on the development of new technologies. In the context of this development, a number of consequences for research can be identified (Hengeveld, 2000, ITRS Consortium, 2007, NEM Consortium, 2007, Microsoft Research Cambridge, 2006).

Statement 1: The rate of increase in storage capacity, bandwidth and data-crunching computing power is leading to fundamental transformations in all aspects of the music economical chain.

First, the increasing capacity of data storage and transfer supports the accumulation of, and easy access to, ever-larger volumes of data. A resulting benefit is better access to knowledge, such as online access to vintage publications, supplementary data, new publication formats and so on. This accessibility is empowering to the scientist. At the same time, like the invention of printing, it has an effect on the embodiment of knowledge itself, shifting the centre of gravity of knowledge from brain, to book and onward to database.

A second effect is the shift towards data intensive methodologies that involve gathering or compilation of large volumes of data. This allows a focus on data intensive phenomena, phenomena that are either intrinsically complex, or else accessible only as patterns within multiple or complex observations. In the field of music studies, an enquiry might call for the processing of a large library of musical scores or a large database of audio data. A few years ago such a quest might have remained untouched for lack of access to the data, or no room to store it in the computer, or no time to wait for the computer to give an answer. By the same token, a topic once respectable for its technical difficulty might suddenly become trivial. In ways such as these, information technology affects the focus of science.

A third consequence is the shift away from analytical and theoretical approaches towards a reliance on computer models and simulations. This approach, which can be observed in fields as diverse as pure mathematics (computational proofs), statistics (Montecarlo methods, bootstrap), biology (DNA sequence alignment), linguistics and speech engineering (data-driven methods), has engendered a degree of unease and debate (Seiden, 2001). Does a proof that only a computer can follow really contribute to our understanding? Similar unease met the invention of algorithms, infinity, or proof by induction. In similar vein, one can ask whether a drum machine can be qualified as a musician? Or whether 'jazz improvisation' by a computer is really a genuine improvisation?

A fourth consequence is the development of machine-embedded knowledge such as that gathered by machine-learning techniques. Arguably these techniques come closer to delivering the promises of intelligence than has the so-called Artificial Intelligence (AI) research itself. With them, intelligence is attained more by the clever use of tricks and devices in machines than by the artifice of man. At the confluence of statistical estimation techniques and neural network theory, machine learning harnesses the computer to compile and extract regularities from massive quantities of data. The knowledge thus obtained, usually impossible to describe to a human brain and useless without a computer, is nonetheless empowering for web search, spam filtering, or musical content indexing and retrieval. As models of brain processing, machine-learning techniques may eventually provide a bridge between information technology and neurosciences. Particularly relevant to music technology are new techniques of signal processing related to machine learning.

In summary, progress in information science and technology is fuelling a drive towards data- and computation-intensive approaches to knowledge acquisition and problem solving, particularly in domains relevant to sound and music computing. These have deep implications for the nature of scientific and technological knowledge and how it is brought to bear on our needs.

Statement 2: Information technology is profoundly reshaping the methodologies and scope of scientific inquiry and technological development.

Research Trend 2: Cognitive science: from musical mind to brain

Cognitive sciences (Wilson and Keil, 1999) focus on how humans interact with their environment, mostly from the viewpoint of perception and action. Developments in this research domain have had a huge impact on sound and music computing. In fact, studies on musical memory, learning and all activities related to music perception and action, such as extraction of high-level information from musical stimuli or gestural sound control, can be considered the basic constituents of sound and music computing applications.

The cognitive science of music (as practised in, for example, cognitive musicology, experimental music psychology or the neurosciences of music) has its focus on the semantic gap that exists between our daily meaningful experiences with sound and music on the one hand, and the encoded physical energy of sound and music on the other. When dealing with music, we call upon content and meaning, whereas the encoded physical energy is just a way of storing information in a technological device. How are the two connected? How can we access the encoded information by means of meaningful actions? Research in cognitive science aims at providing new insights into this semantic gap problem. Several different approaches to solving this problem can be distinguished.

A first research direction starts out from the premise that the human mind is embodied (Knoblich et al., 2006). Rather than trying to solve the semantic gap problem by looking at formal structures and higher-level or low-dimensional representational spaces, the relation between human meaning and encoded physical energy is here seen as being mediated by the human body. For example, if an ambiguous musical rhythm is presented, then it is assumed that the motor system of the human body engenders the anticipatory mechanisms (called emulation) that allow a disambiguated auditory perception of it. Action is here seen as a crucial component for auditory perception, with action and feedback mechanisms being considered at different processing levels, from feedback mechanisms in the auditory periphery (e.g. the role of outer hair cells in attenuation) to the role of intended actions in perception. The embodied viewpoint may revolutionise how we think about ICT development in that it calls for new technologies that mediate between the human mind and its musical environment, based on a multi-sensory approach to sound and music computing (Leman, 2007).

Statement 3: The embodied viewpoint calls for new technologies to mediate between the human mind and the environment.

A second research direction is concerned with the methodologies for acquiring knowledge about the semantic gap problem. In the last decade, these methodologies have been extended from behavioural to brain research. Knowledge about the brain is progressing rapidly and at multiple scales which include molecular, synaptic, cellular, cell assembly, and regional and functional anatomy as revealed by brain imaging. Today our tools include molecular biology techniques for probing the membrane and synaptic properties of neurons, physiological recording techniques to observe entire neuronal assemblies, non-invasive imaging techniques to probe activity within the human brain, computational tools to gather and process the resulting data, and theoretical tools to make sense of the complexity of what is observed. Some recent studies in neurophysiology include the use of awake preparations (often coupled with behavioural studies), multiple unit recordings, simultaneous invasive and non-invasive brain imaging techniques (to calibrate one with respect to the other), selective brain cooling, optical imaging and the coupling of one of these with genetic engineering or biochemical manipulations to probe specific stages in processing. Research in brain imaging includes the use of higher magnetic fields for structural and functional MRI (magnetic resonance imaging), increased numbers of channels in EEG (electroencephalography) or MEG (magnetoencephalography), simultaneous recording of fMRI and EEG, or EEG and MEG, and use of pre-surgical supradural or intracortical recordings from patients to obtain 'close up' snapshots of brain activity.

An important facilitating factor in these developments is progress in hardware and software techniques for handling and interpreting the massive data sets produced by brain imaging. In short, there is presently a rapid development of different technology-driven methodologies that provide new insights into how the brain is involved in the semantic gap problem.

Statement 4: New technology-driven methodologies are providing new insights into how the human brain processes sound and music.

A third major research effort, situated in theoretical neurosciences, is about the tight interaction between signal processing and machine learning techniques on the one hand, and models of neural processing on the other. A common goal is to find techniques that can harness the extreme complexity of relevant patterns in data (for example databases of environmental, speech or musical sounds) or the structures and mechanisms observed within the brain. The computer here is used as an aid to control a degree of complexity of which our brains cannot otherwise easily comprehend. One promising angle of enquiry is the use of data-driven methods to simulate the processing mechanisms (natural or artificial) under the drive of the data patterns that it is to process. This method can be used as an alternative or complement to more traditional engineering techniques.

The above developments lead to often rather wild speculations on the possible future benefits of neurosciences to computing. An example of such a hypothetical breakthrough might be the possibility of 'downloading' entire cognitive or perceptual processing mechanisms to software. This could result from a combination of progress in recording techniques, theoretical neurosciences and machine learning. Another hypothetical breakthrough (heralded by well-estabished cochlear implant technologies and recent experiments with animal models and impaired humans) could be the widespread development of brain-machine interfaces (BMI). This could result from a combination of progress in interface hardware (e.g. miniaturised electrode arrays), signal processing (to factor out 'noise') and machine learning (to translate between the different codes used by brain and machine). All this is likely to have a huge impact on the sound and music computing field. Examples are hearing aids (e.g. cochlear implants) that allow their users to listen to music at a high quality level, or an intracortical implant that would allow a quadriplegic to play the piano.

Statement 5: Cognitive sciences and neurosciences offer a rapidly expanding window on the human mind and brain, thereby providing new possibilities for solving the semantic gap problem.

Research Trend 3: From subjective experience to cultural content

Research in the humanities is focused on signification practices; that is, on how human beings make sense of their environment and give meaning to their lives. The humanities view this signification practice from a subjective and experiential point of view. Therefore, research of this kind includes anthropology, area studies, communications, cultural studies and media studies. The humanities not only provide insights into these aspects but also train people in the skills necessary for practitioners (e.g. in music playing, painting, film making). Traditionally, research methodologies in the humanities are based on analytic, descriptive, critical or even speculative and imitation approaches, although recent approaches also involve quantitative and empirical studies (e.g. Diamond, 1999; Foster, 1985; Tomasello, 1999). In the cultural and creative industries (KEA, 2006), the humanities can provide the content needed to develop a significant partnership between culture and technology.

Several research efforts in the humanities address this issue. A first approach has adopted the belief that subjective factors (related to gender, education and social and cultural background) play a central role in how people deal with technology. Humanities research may provide the necessary analysis of the role of subjective factors and the social and cultural contexts in which technological applications will function. Knowledge of these factors needs to be incorporated into music retrieval systems and interactive music systems.

Statement 6: Subjective factors play a central role in how people deal with technology in relation to sound and music.

A second research approach is concerned with what is sometimes called 'medialogy'; that is, an approach which combines technology and creativity to design new processes and tools for art, design and entertainment. It involves insight into the creative processes, thoughts and tools needed for media-productions and other arts to exist. Clearly, medialogy is at the crossroads of the human sciences, the creative arts and technology. As such, it is a central pillar of the creative industries.

A third research approach is concerned with the transformation of the cultural sector into the digital domain. This involves the digitalisation of a large part of our cultural heritage. From the humanities point of view, the preservation and archiving of cultural heritage poses huge challenges with respect to issues such as the authenticity of documents, flexible multi-language access and the provision of proper content descriptions of objects from multifarious cultures.

Statement 7: Technology, creative approaches to art, design and entertainment and the digitalisation of a large part of our cultural heritage stimulate each other.

A fourth key topic in the humanities concerns the role of the human body, embodiment, and corporeal skills in signification practices. Human skills, which often require intensive learning, have been studied and described for centuries from a humanistic point of view, often from entirely different cultural perspectives. Accordingly, the humanities provide a rich source of theories, concepts and traditions that are highly revealing and inspiring for new empirical studies and technological applications. An example is the Laban theory of effort (Laban and Lawrence, 1947), which provides a speculative theory but very valuable insight into choreography and expressive moving. This theory can be straightforwardly related to music perception, leading to the interesting approach of gesture-based music retrieval. Another example concerns the philosophical views on intentional behaviour of the human body and how this is currently being integrated into a neuroscientific approach to empathy and social cognition (Metzinger, 2003). The focus on the human body in artistic research is clearly connected with the empirical study of embodiment in cognitive science. In fact, it is thanks to the humanities (e.g. phenomenology, post-structuralism, post-modernism) that this topic has become a genuine research topic on the agenda of empirical sciences that deal with perception, action and the use of tools and technologies. Indeed, some aspects of embodiment, involving emotions and the gesture related to them, can be straightforwardly explored and used in technology-based artistic and cultural applications, even if our knowledge about these processes is limited.

In short, the humanities offer a very rich background from which the problem of the semantic gap can be addressed. Its focus on specific topics such as the human subject, embodiment and social and cultural interaction, along with its often descriptive analytic approach, is highly valuable from the perspective of content creation.

Statement 8: The humanities offer the cultural background and content for sound and music computing research.

Research Trend 4: The rise of multidisciplinary research

Scientific research is currently witnessing two opposing, though intimately related, approaches. On the one hand, it continues to differentiate into more and more specific and narrowly circumscribed sub-fields owing to the accelerating accumulation of ever more specific knowledge. At the same time, new multidisciplinary research fields are emerging within academia, for example in the life sciences, neurosciences and earth sciences. Understanding the complex phenomena facing mankind - from climate change to new epidemics to global economic and social developments - requires the integration of expertise from many fields. The growing importance of multidisciplinarity is being increasingly recognised in research funding agencies and educational organisations.

According to a report recently presented at the OECD Global Science Forum Workshop (National Institutes for Health [NIH], 2006) “[t]he increasing multidisciplinary nature of research [...] is an important overall trend in science policy. For example, during the past four years, the fraction of interdisciplinary research at the United States National Science Foundation has increased significantly”. The NIH Roadmap for Medical Research further states that “the traditional divisions [...] may in some instances impede the pace of scientific discovery”. In response to this, the NIH is establishing “a series of awards that make it easier for scientists to conduct interdisciplinary research”.

As early as the year 2000, The National Sciences and Engineering Research Council [NSERC] of Canada set up a special Advisory Group on Interdisciplinary Research (AGIR) with a mandate to study how interdisciplinary research could be better supported (NSERC, 2002). In 2003, the National Science Foundation [NSF] promoted a study on the convergence of technologies (NSF, 2003) which concluded that: “In the early decades of the 21st century, concentrated efforts can unify science based on the unity of nature, thereby advancing the combination of nanotechnology, biotechnology, information technology, and new technologies based in cognitive science”. Similarly, research funding institutions all over the world are beginning to recognise the need to give special attention to multidisciplinary research funding.

Of course, the fundamental importance of multidisciplinary research is also acknowledged by the European Commission. In the field of ICT, which is of direct relevance to sound and music computing, the “Future and Emerging Technologies” (FET) programme is explicitly targeted towards innovative, multidisciplinary work - in the chapter dedicated to FET, the ICT work programme of FP7 calls for “interdisciplinary explorations of new and alternative approaches towards future and emerging ICT-related technologies, aimed at a fundamental reconsideration of theoretical, methodological, technological and/or applicative paradigms in ICT”, one of the goals of FET being to “[help] new interdisciplinary research communities to establish themselves as bridgeheads for further competitive RTD” (ICT-FET Work Programme, 2007).

Sound and music computing is by definition an multidisciplinary field, ranging from the natural sciences like physics and acoustics through mathematics, statistics and computing, all the way to physiology, psychology and sociology. The global trend towards the recognition of multidisciplinarity should help sound and music computing establish itself more confidently as an encompassing discipline that studies a phenomenon of central relevance to humans in all its necessary breadth. In addition, the emergence of new multidisciplinary fields of research and application is producing new points of contact for sound and music computing.

A prime example of such contact is the current rise of the so-called creative industries (KEA, 2006). While the notion creative industries refers to a sector of the economy, its current upsurge (also in terms of public awareness) also leads to new opportunities for creative multidisciplinary research at the intersection of art, design and technology. Sound and music computing can and will play an important role here. The case of the creative industries also highlights once more - if that were needed - the close ties between scientific research and the arts (see also the Industrial Context section). Artistic visions coupled with creative application ideas are likely to drive sound and music computing research in more ways than can currently be envisioned, resulting in entirely new environments, devices and cultural services.

Statement 9: Multidisciplinary research is increasingly seen as a necessity and an asset, and special programmes for fostering and funding it are being developed. Sound and music computing can take advantage of this and should actively seek alliances with other disciplines, including the arts.

To sum up, in this section we have identified some major trends related to the rapid progress in ICT, the development of cognitive science and the advent of brain science, the role of human sciences in addressing the human subject and its action-related contexts, and the multidisciplinary nature of scientific research. Sound and music research is at the cutting edge of these trends. It is driven by these general trends in research and it plays an active role in pushing the most advanced stages of each of these developments.


Diamond, J. (1999). Guns, germs, and steel. The fates of human societies. New York: Norton & Comp.

European Commission – Research Area (2007). European research area. .

European Commission – New Instruments (2004). Evaluation of the effectiveness of the New Instruments of Framework Programme VI.

Foster, H. (Ed.) (1985). Postmodern culture. London and Sydney: Pluto Press.

Hengeveld, P, Best, J.-P. , van Beumer, J. , Hooff, B. , van den Poot, H. , and R. de Westerveld (Eds.) (2000). Research trends in information and communication technology: Uncovering the agendas for the information age.Telematica Instituut, Enschede, The Netherlands.

ICT-FET Work Programme (2007). Future and Emerging Technologies. European commission. Information society and media.

ITRS Consortium (2007). The international technology roadmap for semiconductors.

KEA European Affairs [KEA] (2006). The economy of culture in Europe. Technical report.

Laban, R., & Lawrence, F. C. (1947). Effort. London: Macdonald & Evans.

Leman, M. (2007). Embodied music cognition and mediation technology. Cambridge, MA: The MIT Press.

Metzinger, T. (2003). Being no one: The self-model theory of subjectivity. Cambridge, Mass.: MIT Press.

Microsoft Research Cambridge (2006). Towards 2020 Science.

National Institutes for Health [NIH] (2006). Interdisciplinary Research.

National Science Foundation (2004). National Science Foundation Strategic Plan FY 2003-2008.

Natural Sciences and Engineering Research Council [NSERC] (2002). First report of the advisory group on interdisciplinary research.

NEM Consortium (2007). Networked and electronic media - European technology platform: Strategic research agenda.

Seiden, S. (2001). Can a computer proof be elegant? Communications of the ACM, 32: 111-114, 2001.

Tomasello, M. (1999). The cultural origins of human cognition. Cambridge: Harvard University Press.

Wilson, R.A. and Keil, F. (Eds.) (1999). The MIT Encyclopedia of the cognitive sciences (MITECS). Cambridge, Mass.: The MIT Press.


The context of education of sound and music computing is quite complex, mainly due to its multidisciplinary nature and the consequent difficulty of fitting it into the traditional, discipline-oriented focus of most university level studies. There are almost no specific undergraduate degrees in sound and music computing and the possibilities for a specialisation in this field are centred at the graduate level, where multidisciplinarity is more common.

In Europe, the major developments in education are due to the so-called Bologna Declaration, which aims at creating a common space for higher education in Europe. Below, we identify the trends that are most relevant to sound and music computing. This will be done at each of the three university education levels, namely, Bachelor, Master's and PhD.

Educational Trend 1: The new European higher education area

The EU drive towards the creation of an open European higher education area (EHEA) is both a reaction to and a reinforcement of the profound changes which have occurred in recent years: universities are educating larger numbers of students, from a wider range of backgrounds and with a wider range of skills, on entry; students are more mobile, spending parts of their education in other countries. This drive, initiated with the Bologna Process (European Commission – The Bologna Process, 2007), is creating a framework that enables closer cooperation between higher education institutions in Europe, facilitates student and staff mobility and increases the attractiveness of European higher education in the world. In the following paragraphs, we discuss these trends and their significance for the design of new curricula in the field of sound and music computing.

Improving quality in the curricula is seen as one of the keys to greater recognition of qualifications across Europe, and this viewpoint is being taken by many universities as an opportunity to update and add more flexibility into their programmes (Reichert and Tauch, 2005). These changes will definitely be beneficial for multidisciplinary fields like sound and music computing; in fact many institutions explicitly praise the new freedom to design multidisciplinary Master's programmes, as well as programmes in emerging areas of science and knowledge. The wave of reform in European higher education seems to be going even further and deeper than the Bologna reforms themselves.

A second key ingredient in curricular reform is the link between higher education and employment. The Bologna Declaration particularly calls for undergraduate degrees to be relevant to the labour market. There is in general a growing push towards shorter study cycles, and many EU countries have already adopted the two-cycle qualification structure based on the Bachelor's and Master's distinction (Tauch, 2004). Employability is also seen as an important criterion in the design of new degrees in sound and music computing. The music/multimedia industry at large is in the middle of important changes and is trying to adapt to the new markets and exploring the potential of sound and music computing technologies (see the Industrial Context section). New curricula in sound and music computing have the opportunity to address these emerging labour markets.

A major recent change in higher education has been the increase in student mobility. A considerable part of overall mobility is supported through the EC's Erasmus/Socrates programme (European Commission – Socrates/Erasmus, 2007)), established in 1987, which seeks to reinforce the European dimension of higher education by encouraging transnational cooperation between universities and boosting European mobility. The figures for mobility reflect a steady improvement, but remain below what the Commission considers necessary (European Commission - Education and Training 2010, 2007). Moreover, the EU still attracts less talent than its competitors (European Commission – Lisbon, 2007). Sound and music computing research in Europe has a successful track record involving excellence spread over several centres which have gained world leadership through complementarity and coordination supported by EC funding. This excellence has to be exported to the higher education domain, in order to attract students, scholars and researchers from other world regions.

Statement 1: The creation of a common space of higher education in Europe give more possibilities for designing curricula in sound and music computing.

Educational Trend 2: Discipline oriented bachelor education

The tradition of bachelor (undergraduate) education is very much discipline oriented. A student has to choose a curriculum aimed at developing a number of specific competences in a particular discipline plus a few general academic and professional competences. However there are curricula in Europe that are more multidisciplinary or that allow a student a wider choice of itineraries, thus permitting the design of 'custom made' curricula. With respect to research, the involvement of bachelor students in such activities as a normal part of their curriculum is still very exceptional. Given that there are many academic disciplines integral to sound and music computing research, the education given in all the bachelor degrees supporting these disciplines is of interest to any future sound and music computing researcher. Thus a student wanting to become a sound and music computing researcher might choose a bachelor degree related to musicology, physics, computer science, electrical engineering, psychology, music composition, etc... Within most of the undergraduate programmes that support these disciplines, there are specific courses that might be of very great relevance. But in most cases it really depends on the professor responsible for the course and the special focus given to it. Figure 1 and 2 provide some indicative data about content areas in courses and curricula. These data were gathered in a survey by the S2S2-consortium and will be updated and expanded in the future. The content areas for education in sound and music computing include systematic musicology, auditory and music perception-action, auditory and music cognition, music acoustics, audio signal processing, hardware and software, sound modelling, sound analysis and coding, music information processing, music performance, multimodal interfaces, sound design and auditory display, and application areas.

Statement 2: Numerous paths, embedded in different well-established undergraduate degrees, can be designed to approach a multidisciplinary field such as sound and music computing.

In the context of sound and music computing, the music conservatories are a special case of higher education institutions. Traditionally, they have a strong professional orientation and thus might not provide the necessary background for a musician wanting to follow a research career. This situation has been slowly changing, due both to the transformations taking place in the music profession and also, in Europe, to the inclusion of the conservatories into EHEA (European Association of Conservatoires, 2005). Slowly, the conservatories are converging with the university system. It is now becoming quite common for a conservatory to offer a degree with a strong technological component. There are, for example, conservatory degrees in sound recording, tonmeister, sonology, music technology, electro-acoustic music, etc... Most of these degrees remain professionally oriented but very much related to sound and music computing. Conservatories are also slowly incorporating the idea of research as one of their institutional aims and are designing curricula which are closer to the university model.

Statement 3: New conservatory degrees are a model for professionally oriented undergraduate curricula in sound and music computing.

Apart from the traditional university degrees and the case of the music conservatories, there are quite a number of multidisciplinary undergraduate programmes related to sound and music computing, especially in the US and Great Britain. In the Anglo-Saxon system, it is much easier for universities to establish multidisciplinary programmes or even to allow student-centred curricula with individual academic pathways. However, there is an ongoing discussion among academics and researchers about the type of undergraduate education that is best suited to the preparation for a research career in a multidisciplinary field like sound and music computing. Should it be a strongly discipline oriented undergraduate degree or a multidisciplinary programme?

The adoption of a common system of credits, such as the ECTS system, plus the existence of funding programs like Erasmus to support mobility have had a big impact on undergraduate education too. They have led students to become familiar with other approaches to a given field and have given them the opportunity to take courses not offered in their home university. The Erasmus programme has also facilitated the creation of networks of universities with complementary undergraduate degrees in a given discipline, so that experiences among faculty members can be shared and the curricula opportunities for students are widened. Due to the variety of disciplines and methodological approaches involved in the sound and music computing field, it is not easy to find educational institutions with an extensive coverage of all of them. It is thus very useful for a bachelor student wanting to get a wider view of the field to take courses in different centres.

Statement 4: Bachelor (undergraduate) degrees with multidisciplinary contents encourage student mobility.

Educational Trend 3: Multidisciplinary studies at Master level

The objective of a Master programme is academic or professional. The academic Master serves as the bridge between undergraduate programmes, which are mainly based on courses, and PhD studies, which are mainly based on research. These Master degrees are generally developed by universities that take advantage of existing research strengths. Therefore, the Master programmes tend to reflect the research focus of university departments and faculty. Universities have a large degree of autonomy in setting up and modifying their Master programmes, much more so than at undergraduate level. These programmes can therefore be more easily adapted to universities' educational and research strategies. Research Masters used to be common in Great Britain but rare in continental Europe. But as part of the Bologna process, most European universities are now integrating PhD courses into Master's programmes and creating new Master degrees (Reichert and Tauch, 2005). Many programmes continue the traditional discipline oriented focus, thus offering a clear continuity from undergraduate studies, but they tend to have a greater degree of flexibility. The students have a greater choice of optional courses and, since the research thesis is a major part of the programme, they are able to work independently under the supervision of a tutor.

Statement 5: It is becoming easier for university faculties and research groups to introduce a student enrolled in a Master programme into any given research field.

In the last few years there has been a proliferation of multidisciplinary Master programmes. Many of the key current research issues require multidisciplinary approaches and researchers need to be trained appropriately. Multidisciplinary education requires collaboration between institutions and thus there is a clear trend toward promoting it. Collaborations between departments of the same university, between universities of the same country and between universities of different countries are becoming commonplace.

At the European level, the Erasmus Mundus (European Commission - Erasmus Mundus, 2007) is a co-operation and mobility programme which supports European top-quality Master courses and enhances the visibility and attractiveness of European higher education in third countries. It also provides EU-funded scholarships for third country nationals participating in these Master Courses, as well as scholarships for EU-nationals studying at partner universities throughout the world.

It is a challenge for music institutions to offer musicians, in addition to instrumental training and practice, a reflective environment that nourishes innovation and creativity paired with the extension of knowledge and artistic understanding (European Association of Conservatoires, 2005). It becomes equally interesting when attempts are made to bridge the gap between theoretical research and musical practice. A great effort is being made by the European conservatories to develop Master programmes and PhD studies and thus to incorporate research into their educational and institutional aims. It might take some time before this happens.

Statement 6: The multidisciplinary nature of sound and music computing research can find the right educational framework at the Master level.

Educational Trend 4: The professionalisation of PhD studies

Doctoral studies have traditionally been based on independent research undertaken by the doctoral candidate who draws upon the advice and guidance of a supervisor, supposedly on the model of a Master/apprentice relationship. This type of arrangement has long been the norm. For non-traditional fields like sound and music computing, it has had the advantage that a student is able to do a PhD just by finding an appropriate faculty member with sufficient knowledge of the chosen topic and a willingness to guide and support the student through the programme.

However, as a result of the changing context, PhD studies have recently come under scrutiny. Among the new challenges faced by universities in relation to doctoral education, it is worth mentioning the following desiderata (Sadlak, 2004):

  • to review the structure of training for researchers and integrate doctoral programmes into the Bologna Process;

  • to deal with increased competition, from outside and within Europe;

  • to increase and strengthen co-operation with businesses and to contribute more effectively to technological innovation;

  • to find a new balance between basic and applied research;

  • to enhance the employability of researchers by including in their training both core skills and wider employment-related skills.

PhD students doing multidisciplinary research are more diverse than their disciplinary counterparts. They may have any one of a wide range of subject backgrounds and may sometimes have followed more than one educational pathway. The background of students doing research in sound and music computing ranges from music to mathematics, from psychology to electrical engineering. What they have in common is the aim of bridging disciplines to develop new and multidisciplinary knowledge. There is general agreement (Metzger and Zare, 1999) that this type of multidisciplinary research should conform to the following:

  • Consistency with established knowledge in multiple separate disciplinary antecedents: how the work stands vis-à-vis what researchers know and find tenable in the disciplines involved.

  • Balance in weaving together perspectives: the extent to which the work hangs together as a generative and coherent whole.

  • Effectiveness in advancing understanding: the extent to which the integration of disciplinary perspectives advances the goals that have been set and the methods used.

Statement 7: The traditional model of a Master/apprentice relationship in PhD studies is evolving in a much more complex education environment, especially for multidisciplinary fields like sound and music computing.

The need for more structured PhD studies in Europe and the relevance of such studies to the Bologna Process have been highlighted repeatedly in recent years. In particular, joint PhD programmes can be amongst the most attractive features of the EHEA. But for the time being, interested students are still confronted with a variety of national and institutional structures that are not easily comparable.

Statement 8: Joint sound and music computing PhD programmes at the EU level can be built by exploiting excellence spread over several centres with complementary competencies.

There is a clearly growing trend towards the professionalisation of PhD studies, involving the inclusion of coursework and training in transferable skills aimed at facilitating the flow of doctoral students into the wider job market. Students are becoming employed researchers within well-structured research groups and funded within well-focused research projects. This increases the pressure to have money for PhD programs. Within this context, PhD students represent major academic and financial investments and contribute to much of the original research in universities. The role of supervisors seems key to the success or failure of multidisciplinary PhD projects (Fry et al., 2004). There is clear evidence that the disciplinary background, interest and motivation of the supervisor have much influence on research outcomes, both in terms of its quality and also whether PhD studies are completed on time (or at all).

However the added-value of a PhD for employment outside the areas of research in universities, research institutes and R&D functions in industry remains somewhat limited. Central and East European countries especially, as well as South European countries, experience a continuing lack of interest on the part of employers outside the academy in hiring PhDs. The situation is almost reversed in the US, where a significant and ever growing number of PhDs are attracted to private sector employment in which remuneration is higher than in the academy (Sadlak, 2004).

Statement 9: multidisciplinary PhD programmes avoid a focus which is too narrow and provide a broad spectrum of knowledge that also qualifies their graduates for careers outside the academy.

To sum up, the above trend analysis shows that the European educational system is in full development at all levels from Bachelor to Master and PhD. Furthermore, there is a willingness to further integrate educational systems from art and science. These developments will have a huge impact on the sound and music research field.


European Association of Conservatoires (AEC) (2005). The Bologna declaration and music.

European Commission - Education and training 2010 (2007). Diverse systems, shared goals.

European Commission - Erasmus Mundus (2007).

European Commission – Lisbon (2007). Progress towards the Lisbon objectives in education and training, 2005.

European Commission - Socrates/Erasmus (2007). The European Community programme in the field of higher education.

European Commission - The Bologna Process (2007). Towards the European higher education Area.

Fry, G.,Tress, B. and G. Tress (2004). PhD students and integrative research. In Proc. Frontis workshop from landscape research to landscape planning: Aspects of integration, education and application, Wageningen, The Netherlands, June 2004.

Metzger, N. and R. Zare (1999). Science policy: Interdisciplinary research: From belief to reality. Science, 283: 642-643.

Reichert, S. and Tauch, C. (2005). Trends IV: European universities implementing Bologna.

Sadlak, J. (2004). Doctoral studies and qualifications in Europe and the United States: Status and prospects. Technical report, UNESCO-CEPES.

Tauch, C. (2004). Almost half-time in the Bologna Process. Where do we stand? European Journal of Education, 39(3): 275-288.



Sound and music computing has always been an applied research field quite close to the music industry, thus close to the industries that create, perform, promote and preserve music. These industries involve: composers; performers and ensembles; publishers, record producers, manufacturers, labels and distributors; managers and agents; instrument makers; and some others. But right now sound and music computing technologies have a much broader impact and are present in most of the industries that sit at the nexus of cultural, entertainment, leisure and fast moving consumer goods.

A recent study of the economic impact of the cultural and creative sector in Europe (KEA European Affairs [KEA], 2006) revealed that the annual turnover of the sector (€654 billion in 2003) is larger than that of the motor industry or even ICT Manufacturers. This sector, of which music industry forms a major part, contributed 2.6% of EU GDP in 2003, slightly more than the contribution of the chemicals, rubber and plastic products industries combined. The sector's growth in 1999-2003 was 12.3% higher than that of the general economy and in 2004, about 5.8 million people worked in it, equating to 3.1% of the total employed population in the EU25. In view of the European Council's Lisbon agreement of March 2000, that the EU by 2010 should become “the most competitive and dynamic knowledge-based economy in the world, capable of sustainable economic growth with more and better jobs and greater social cohesion”, reinforced coordination of activities and policies impacting on the cultural and creative sector within the EU should be given a high priority.

Given this context, it is clear that the industries that relate to sound and music computing are in the middle of important changes and most are trying to adapt to the new markets and exploring the potential of these technologies. From the writings and presentations of industry experts, we can identify seven major trends.

Industrial Trend 1: Towards a knowledge-based economy

Modern economies are increasingly based on the production, distribution and use of knowledge and information. Knowledge is now recognised as the driver of productivity and economic growth. From the OECD report (OECD, 2005) it is clear that this long-term trend towards a knowledge-based economy is continuing. Science, technology and innovation have become key contributors to economic growth in both advanced and developing economies. Investment in knowledge (comprising expenditure on R&D, software and higher education) in the OECD area reached around 5.2% of GDP in 2001, compared to around 6.9% for investment in machinery and equipment. The share of knowledge-based market services is continuing to rise and now accounts for over 20% of OECD value added. The share of high and medium-high technology manufacturing fell to about 7.5% of total OECD value added in 2002, compared to about 8.5% in 2000.

Statement 1: Music related activities are part of the new knowledge economy and they should take advantage of the continuing growth of this sector.

Industrial Trend 2: A global economy

Economies have expanded beyond national borders. Production in particular has been expanded by multinational corporations to many countries around the world. The global economy includes the globalisation of production, markets, finance, communications and the labour force.

From the OECD report (OECD, 2005) we learn that this is not a new phenomenon per se, but that it has become more pervasive and driven mainly by the use of information and communication technologies (ICT). In the knowledge economy, information circulates at the international level through trade in goods and services, direct investment and technology flows and the movement of people. According to the American National Science Board (2006) the globalisation of R&D, S&T, and S&E labour markets is continuing. Countries are seeking competitive advantage by building indigenous S&T infrastructures, attracting foreign investments and importing foreign talent. The location of S&E employment is becoming more internationally diverse and those who are employed in S&E have become more internationally mobile.

Statement 2: Both the production and consumption of music related goods is now globalised and international cooperation is more important than ever.

Industrial Trend 3: The development of the ICT sector

In the final decade of the twentieth century, the almost simultaneous arrival of mobile phones and the Internet not only changed the face of communications but also gave impetus to dramatic economic growth. We now speak of the Information and Communication Technologies (ICT) sector to refer to the agglomeration of the communications sector, including telecommunications providers and the information technology sector, which ranges from small software development firms to multinational hardware and software producers.

According to the i2010 report (i2010 - European Information Society, 2007), ICT accounts for a quarter of EU GDP growth and 40% of productivity growth. The digital convergence of the information society and media services, networks and devices is finally becoming an everyday reality: ICT will become smarter, smaller, safer, faster, always connected and easier to use, with content moving to three-dimensional multimedia formats. It has been pointed out (Saracco, 2002) that any economic indicator ties together progress and communications infrastructure, and that the dissemination and progress of culture go hand in hand with the possibility of interacting and sharing ideas, thus putting telecommunications at centre stage.

The American National Science Board (2006) reports that the number of industrial researchers has grown along with rapidly increasing industrial R&D expenditures. Across OECD member nations, employment of researchers by industry has grown at about twice the rate of total industrial employment. For the OECD as a whole, the full-time equivalent number of researchers more than doubled in the two decades from 1981 to 2001, from just below 1 million to almost 2.3 million. Over the same period, the number of researchers in the United States rose from 0.5 million to nearly 1.1 million.

According to the KEA report (KEA, 2006) the ICT sector is central to European growth and competitiveness and has been identified as a pillar of the European Lisbon Strategy. It accounts for 5.3% of EU GDP and 3.4% of total employment in Europe. In the period 2002-2003 it was responsible for more than a quarter both of productivity growth and of the total European R&D effort. Darlin (2006) predicts that flat-screen televisions will get bigger and that MP3 players and cell phones will get smaller. And almost everything will get cheaper. But the biggest trend expected is that these machines will communicate with one another.

According to the OECD (OECD – Digital Music, 2005), digital music and other digital content are also drivers of global technology markets, both to consumer electronics manufacturers and PC vendors. The increase in revenues from hardware in the PC and consumer electronics branch resulting from the availability of online music, authorised or not, is potentially greater than the revenues currently generated by paid music streaming or downloads.

Statement 3: The growth of the ICT sector and the innovations coming out of it will be the main driving forces for the music related industries.

Industrial Trend 4: The interdependence of the cultural & creative sector and ICT

The cultural and creative sector generates significant economic performance in other non-cultural sectors, thereby indirectly contributing to economic activity and development, and in particular in the ICT sector. Culture contributes directly to the economy by providing products for consumption, namely the cultural goods and services embodied in books, films, music sound recordings, concerts, etc. But the recent growth of the creative media, according to KEA (2006), is due to the growing diffusion and importance of the Internet. The impact of this development on media consumption has been huge in recent years and will be the major factor for this sector in the future. At the same time, creative content is a key driver of ICT uptake. The consultancy firm PriceWaterhouseCoopers estimates that spending on ICT- related content will account for 12% of the total increase in global entertainment and media spending until the year 2009 (see KEA, 2006). Accordingly, the development of new technology depends to a large extent on the attractiveness of content and the new networks are no exception. The development of mobile telephony and networks is based on the availability of attractive value-added services that will incorporate creative content, to which the sound and music computing may contribute.

However, the KEA report also predicts that the roll out of broadband and the digitisation of production processes will require significant investment for the creative industries to adapt, as well as changes in its management practices. Some industries (notably music) have to go through aggressive cost restructuring programmes and are experiencing consolidation through mergers. Without a strong music, film, video, TV and game industry in Europe, the ICT sector will be the hostage of content providers established in Asia or North America.

Statement 4: Content is a major driver of ICT development.

Industrial Trend 5: New models of exploitation of content

The new ICT technologies have opened up new possibilities for the exploitation of music. Traditionally there have been two distribution channels for media content, namely, physical distribution and analogue broadcasting (radio, TV). Now we also have IP/Internet, Mobile communications (UMTS), Digital TV and Radio.

The OECD report on digital music (OECD - Digital Music, 2005) identifies that network convergence and widespread diffusion of high-speed broadband have shifted attention towards broadband content and applications that promise new business opportunities, growth and employment. Digital content and digital delivery of content and information are becoming increasingly ubiquitous, driven by the increasing technological capabilities and performance of delivery platforms, the rapid uptake of broadband technologies - with 2004 identified as a breakthrough year for broadband penetration in OECD countries - innovative creation and use of content and improved performance of hardware and software.

Through a combination of new technologies, new business relationships and innovative service offers to consumers, the market is developing rapidly in order to realise the potential of online music. Saracco (2002) predicts that in ten years' time nearly all communications (over 90% of it) will be using fixed networks, while most people will be under the impression they are using mobile networks. He observes that in the coming years we are going to see a tremendous increase in communicating entities, be they applications or objects. The amount of communication directly involving humans will keep growing but at a slower pace, fuelled mostly by the dissemination of telecommunications in developing countries.

According to OECD – Digital Economy (2006), users are becoming increasingly active, in such a way we are entering a participatory culture not of consumers but of users. Users are increasingly active and want to express themselves. This is highly relevant to a field such as sound and music computing, which is closely linked to creation and expression.

Statement 5: Interactive broadband networks are revolutionising the way music is distributed and consumed.

Industrial Trend 6: New forms of Intellectual Property protection

Concerning intellectual property protection, there are traditionally two extreme positions to be defended, namely, absolute control of a creation or complete release of the rights to it. However, until recently, there was no easy way to make explicit the rights that an author gives in relation to a creation. Creative Commons ( is the first example of a system that offers flexible protection of intellectual rights.

Kusek and Leonhard (2005) claim that the issue of protecting intellectual property goes far beyond music and audio technologies. Nevertheless, the crisis started in the music industry. Already, music recording industry revenues are down sharply, despite an overall increase in the distribution of music. The financial crisis has caused music labels to become cautious and conservative, investing in proven artists, with less support available for new and experimental musicians. Kusek and Leonhard note that the breakdown of copyright protection is even starting to impact on musical instruments. Synthesisers, samplers, mixers and audio processors can all be emulated in software. For example they estimate that at least 90% of the copies of Reason, one of the emulation software leaders, are pirated.

Commenting on OECD – Digital Economy (2006) they note the existence of sharp disagreement as to whether intellectual property rights (IPR) currently strike the right balance. There are three points of view: some believe that interest- group pressure has led to excessive protection; some adopt an intermediate position, believing that recent court cases such as Grokster have clarified secondary liability and that this has been sufficient to clarify the IPR situation; a third group maintain that levels of protection and enforcement are still insufficient and should be strengthened. That same OECD report proposes a tentative work agenda that might address the following needs: first, putting intellectual property in its proper place, that is, balancing private incentive versus the public good; second, achieving new digital rights definitions which integrate old rights (e.g. fair and legitimate use) and new rights (e.g. access to orphaned and out-of- print works); finally, accommodating new models of production and distribution (Open Source, Open Format, Open Access).

According to KEA (2006), the main beneficiaries in Europe of the digital revolution have been the telecom operators acting as Internet service providers. Broadband access spending has risen very rapidly. This growth is largely due to the availability of free content. Indeed, 95% of music downloads today, for example, are unpaid for.

Statement 6: New models of the control and use of intellectual property rights are impacting on the music industry and opening up new possibilities for the protection and dissemination of music content.

Industrial Trend 7: Revolution in the music business

The whole music business is going through a major revolution, the main cause of which is the development and expansion of the ICT sector. According to OECD – Digital Music (2005) the rise of online music has resulted in product and process innovation, the entry of new players and new opportunities for music consumption and revenues, involving different forms of disintermediation, and the continued strong role of some traditional market participants (especially the record labels). In the new digital model, artists, majors and publishers have so far retained their creative roles related to the development of sound recordings.

Direct sales from artist to consumer or the building of an artist's career purely through the online medium are still rare. Nevertheless, the Internet allows for new forms of advertising and possibilities that lower the entry barriers for artistic creation and music distribution. According to Kusek and Leonhard (2005), ever since the invention of electricity, music and technology have worked hand-in-hand, and technology continues to catapult music to unprecedented heights. Today, the Internet and other digital networks, despite all the legal wrangling, have made music bigger than ever before. Within ten to fifteen years, Kusek and Leonhard claim, the “Music Like Water” business model will make the industry two or three times larger than it is today. They imagine a world where music flows all around us, like water or electricity, and where access to music becomes a kind of "utility". Not for free per se, but certainly for what feels like free. Along the same line, Kurzweil (2003) claims music technology is about to be radically transformed. Communication bandwidths, the shrinking size of technology, our knowledge of the human brain and human knowledge in general are all accelerating. Music will remain the communication of human emotion and insight through sound from musicians to their audience, but the concepts and process of music will be transformed once again.

Statement 7: The possibilities of the ICT technologies are completely reshaping the music business.

To sum up, the identified trends show a rapid development towards a knowledge-based and global economy, with a major role of the ICT sector. Reports indicate a growing interest in the mutual dependency of the cultural and creative sector and ICT, which leads to new roles of content exploitation and dealing with intellectual property issues. All these developments accompany the revolution that is currently taking place in the music business. The sound and music computing field is expected to play a crucial role in these developments.


OECD. OECD Science, Technology and Industry Scoreboard - Towards a knowledge-based economy. Organisation for Economic Co-operation and Development., 2005.

Information Society and Media DG. i2010 - A European Information Society for growth and employment.

R. Saracco. Is there a future for Telecommunications?, 2002.

National Science Board. Science and Engineering Indicators 2006. Technical report, National Science Foundation, 2006.

D. Kusek and G. Leonhard. The Future of Music: Manifesto for the Digital Music Revolution. Omnibus Press, 2005.

Ray Kurzweil. The Future of Music in the Age of Spiritual Machines. Lecture to the AES 2003.

D. Darlin. Data, music, video: Raising a curtain on future gadgetry. New York Times, January 2006.

OECD. OECD Report on Digital Music: Opportunities and Challenges. Technical report, 2005.

OECD. The Future Digital Economy: Digital Content Creation, Distribution and Access. Technical report, 2006.

KEA European Affairs. The Economy of Culture in Europe. Technical report.

Creative Commons.

Social and Cultural

This section is about the social and cultural context in which music appears and the way in which sound and music computing is related to it. Indeed, music is an important aspect of all human cultures (Merriam, 1964). Musical activity involves a mental context of values and goals, as well as an institutional context of societal organisations and structures, and relates to all kinds of interactions with other humans, with nature and with material objects and machines.

Musical activity is, moreover, explorative, creative, and innovative, and can focus on expression (via art and music works), the acquisition of knowledge (via music science and research) or the development of tools to act (via music technology and industry). Besides all this, music is also meant to provide new experience, to give sense and meaning to life, to console and to promote social coherence and personal identity in and over very diverse social and ethnic groups (Hargreaves and North, 1999). Rooted in the biology of every human being (Wallin et al., 2000), music is a core occupation of our technological society.

The KEA (2006) study on the cultural and creative industries in Europe reveals that the expansion of the ICT sector depends to a large extent on the attractiveness of cultural content. Music has thereby been identified as one of the most vibrant cultural industries with a flourishing music research component embedded within a particular social and cultural context. According to this study, cultural activities can be stimulated by both bottom-up, grass-roots initiatives and also the top-down initiatives of administrations and institutes. These social and cultural strategies are beneficial to the economic environment because they:

  • reinforce social integration and help build an “inclusive Europe”

  • contribute to fostering territorial cohesion

  • contribute to reinforcing the self-confidence of individuals and communities

  • participate in the expression of cultural diversity.

Below, some particular features of the current socio-cultural context are described. These provide a background against which we can better understand trends and open problems related to sound and music computing research.

Socio-Cultural Trend 1: Transgression and uncertainty

Classical views hold that the socio-cultural context is largely shaped by developments in science/technology, whose authority, values and practices permeate all dimensions of society and culture. However, more recent views (Nowotny et al., 2001) hold that, owing to the growth of complexity, unpredictability and irregularity in both science and society, this one-way influence has given way to the mutual influencing, or even transgression, of science/technology and society/culture, as well as of university, industry and government (Etzkowitz and Leydesdorff, 2000).

The inherent generation of uncertainties (often resulting from the quest for innovation) yield different research practices, which are reflected in an increasing number of different directions in which technology could be explored and exploited. Which directions are selected may be strongly driven by the dynamics of innovation and by economic rationality. However, as this dynamics cannot be entirely planned, there is a need for values and goals which allow for uncertainty.

In the context of EU research policy, the European Commission - Europe 2010 (2005) has defined strategic objectives which draw upon solidarity and security. These objectives are based on concepts such as a friendly business environment, the embracing of change, economic and social cohesion, responsibility for common values, justice and risk management. This approach can be adopted as a basic framework for the social and cultural values and goals of sound and music computing research. It implies, among other things:

  • respect for the diversity of socio-cultural identity,

  • the care of cultural heritage (preservation and archiving),

  • openness to cultural change and new forms of expression,

  • democratic access to knowledge,

  • a culture of participation and participation in culture.

Statement 1: The uncertainty that is inherent in sound and music computing research should be guided by the specification of social and cultural values and goals.

Socio-Cultural Trend 2: Beyond the logic of economic rationality

Socio-cultural values and goals may guide the development of sound and music computing research by bringing forward certain requests. For example, while music information retrieval research has excelled in developing tools for common mainstream commercial (popular) music, it has to a large extent neglected more culturally interesting musical expressions, such as classical music and music of other non-Western cultures. Clearly, in such a situation, stakeholders in the social and cultural domains (such as governments, universities, cultural institutions) may require sound and music computing research to develop technology beyond the logic of pure economic rationality and require the development of music information retrieval tools for all kinds of music.

The reason for doing this could be that apart from commercial music, society feels that a broad spectrum of music traditions has a high social and cultural value. Thus, a diverse set of different applications in music information retrieval, interactive systems, education, archiving and entertainment, which form important components for the future eCulture (the electronic environment in which culture is produced, distributed and consumed) should be developed. If society and culture require that this broad spectrum should be taken into account in research, then support should compensate for biases induced by economic rationality. Often, the required socially and cultural valuable developments are supported by government and other institutions.

It is not excluded that support for these areas may boost very innovative technologies which, once a critical mass has been achieved, can then be taken up again in a logic of economic rationality. The European Commission is a strong player in defining the societal values with respect to scientific research.

Statement 2: The (EU) government should inject its support for research at the frontiers of economic rationality.

Socio-Cultural Trend 3: Local specialisation and global integration

In Europe, research in sound and music computing shows a trend towards local specialisation and global integration. Research in sound and music computing is typically done in small dynamic institutions, which are often specialised in small niche areas (such as ethnomusicology, cognitive musicology, data processing or music synthesis). Thanks to collaboration, these small research units can become quite powerful when complementary competences are organized as a broader European network of research units. Over the past decennium, such networks have been entirely based on competition and shifting alliances.

Statement 3: Local specialisation and global integration offers a competitive environment for sound and music computing research.

The multidisciplinary orientation suits the object of research, which is in itself very broad, covering issues in signal processing as well as in symbolic handling of musical information. This multidisciplinary orientation is situated within an economic rationality of production, distribution and consumption, a social rationality involving diverse players such as musicians, organisers, the mass media and the music industry, and a cultural rationality involving contexts related to high culture, low culture, cross-culture and interculture.

Statement 4: Research should be grounded in a multidisciplinary basis because that is the best guarantee for its embedding in the economic, social and cultural reality of our post-industrial society.

Socio-Cultural Trend 4: A neo-evolutionary research model

Given the broad context in which audio and music manifest themselves, sound and music computing research strategies are characterised by emergence rather than planning. This emergence, moreover, is driven by creativity and innovation. Hence it is difficult to predict what may be successful and what not. Sound and music computing's scientific paradigm is therefore close to a neo-evolutionary model (Leydesdorff and Meyer, 2003), in which elaborate systems of peer review, assessment and evaluation leave room for strategies of variation to be pursued by smaller laboratories in different alliances.

Statement 5: Sound and music computing research is strongly driven by innovation, albeit in a context of emergence rather than planning.

In this model, risk analysis is needed to consider the possible implications of research. After all, science and technology do not automatically lead to the best possible world. In developing them, it is necessary to calculate the risks, to keep an eye on the volatile and ambiguous dynamics. The co-evolution of the socio-cultural context and the scientific/technological context implies that an analysis of values and goals should become an integral part of the development of sound and music computing (Nowotny et al., 2001). The best guarantee to cope with unpredictable outcomes, or uncertainties initiated by innovation, is to allow society and culture to speak back to science and technology, hence the importance of reflection, the development of a code of ethics, the concern for democratic access and several other values that should be taken into account.

Statement 6: Democratic access, reflection and a code of ethics should form an integral part of sound and music computing research.

Socio-Cultural Trend 5: Innovation through artistic creation

Creation and innovation form the motor of sound and music computing research. Most interestingly, they are strongly driven by the context of artistic application. In that respect, it is of interest to mention that content-based music technology has roots in the particular cultural rationality of the 1950s (Born, 1995, Leman, 2005). That rationality, heavily supported by European governments of the time, led to novel developments in electronic music, of which interactive multimedia is a recent outcome. In contrast, audio-recording technology had already begun by the early 20th Century and was driven by the logic of economic rationality and the free market (Pichevin, 1997).

The trend of allying content-based music technology to economic rationality is new. But it is reasonable to assume that artistic creation remains a major factor in maintaining the former's innovative character. There are at least two reasons why art is likely to continue to contribute innovating challenges to sound and music computing research:

  • First of all, there is the desire for expression. If tools are used to be expressive, then one is always inclined to go beyond that what is actually possible. Indeed, recent developments in sound and music computing research have pushed back the frontiers of sensing, multi-modal multimedia processing and gesture-based control of technologies.

  • Secondly, there is the desire for social communication, and for technologies that enhance collaboration and exchange of information among communities at the semantic level. And indeed, recent developments in sound and music computing research have pushed back the frontiers of networking into technologies that deal with semantics as well as new forms of human-human and human-machine interaction.

In short, the context of art application results in a constant drive towards human-friendly and expressive technologies of mediation. Artistic and creative research is an important source for innovation and as a producer of content, it can really push the development of ICT (KEA, 2006).

Statement 7: Sound and music computing research should include artistic creation because the latter is a major driving force for innovation, including innovation in music technology.

In the 1950 and 1960s, numerous small music research laboratories played an important role in the development of content-based music technologies (Leman, 2005). Their original focus on electronic music production has now been extended to multi-media art production. This distinctive European approach, based on small but very innovative and specialised art centres connected through electronic networks, offers a unique and rich context for innovation in music/multimedia technology. Participative technologies involving all players in the cultural domain (developers, distributors, consumers, users and artists) can contribute to the formation of a space for eCulture. This space is closely connected to research/science and technology/industry.

Statement 8: eCulture draws on a platform of participation in culture and on a culture of participation.

Socio-Cultural Trend 6: Focus on the user

The socio-cultural context definitely calls for more attention to the user and the human factor in the practice of music technology. Sound and music computing research is characterised by its potential for use and hence by a strong willingness to respond to signals from society and culture. Indeed, the development of music technology should take into account a context of application and focus on different categories of users, the design of appropriate mediation technologies and the pursuit of personalised approaches.

The user can no longer be considered passive, as one that merely registers what is given as stimulus. Instead, the user is an active consumer, which implies a transgression from the domain of pure consumption into that of production and distribution. The active consumer is also a producer and distributor of music, and therefore an active contributor to what happens with music. Being an active consumer implies participation in the whole chain of production, distribution and consumption, forming part of a network of participating users.

Statement 9: Sound and music computing research should take into account the context of application, in which the active user/consumer occupies a central place.

Socio-Cultural Trend 7: Ethics in research

Ethics pertains to what is morally right and wrong. In view of the growing impact of technology, this perspective needs to be addressed in sound and music computing research. The impact manifests itself in various aspects of our social and cultural life. Examples are the personal integrity of subjects involved in experiments and exchange of data, the safeguarding of the rights of those who have invested in producing valuable content, the right to democratic access to information and so on. It is clear that new developments in sound and music computing research should take this context of implication into account. For example, issues of IPR ownership can be a significant barrier to the conducting of large and ambitious research projects, and the new concepts being developed around this issue may therefore be of critical value.

Sensor technologies are another sensitive issue. They may infringe the personal integrity of subjects and therefore the privacy and confidentiality of information. The conceptual and philosophical implications regarding human responsibility in contexts of application need consideration in sound and music computing research.

Statement 10: Sound and music computing research should take into account the context of implication, assessing risks and ethical implications.

To sum up, the social and cultural context has been identified to be an important driver for sound and music computing. Due to the complexity, unpredictability and irregularity in both science and society, decisions in research are often characterized by a fundamental uncertainty. More and more, this uncertainty is solved by a logic of economic rationality. Research then goes where economy requires it. However, social and cultural values are important and call for a transgressive approach to science and society, that is, an approach in which a mutual interchange between science and society is possible. This may result in governmental support for research fields that focus on important social and cultural values that go beyond the logic of economic rationality. Society may also support the creation of research spaces, and support local specialisation and global integration of the many small research units, thereby supporting a neo-evolutionary research model. The social and cultural context is all about the values that really concern to our lifestyle and that, in a democratic society, contributes to what is considered the highest good for all. Music is one such phenomenon that contributes to human well-being. It fosters creative activity, expression and social interaction. Through artistic creation, innovation is possible and contribution to culture is renewed. Society also requires more attention to the role of the user of ICT and there is an important ethical aspect related to modern sound and music computing research applications.


G. Born. Rationalizing culture IRCAM, Boulez, and the institutionalization of the Musical Avant-Garde. University of California Press, 1995.

H. Etzkowitz and L. Leydesdorff. The dynamics of innovation: from national systems and "mode 2" to a triple helix of university-industry-government relations. Research Policy, 29(2):109-123, 2000.

D. J. Hargreaves and A. C. North. The functions of music in everyday life: Redefining the social in music psychology. Psychology of Music, 27(1):71-83, 1999.

M. Leman. Musical creativity research. In J.C. Kaufman and J. Baer, editors, Creativity Across Domains: Faces of the Muse, pages 103-122. Lawrence Erlbaum, Mahwah, NJ, 2005.

L. Leydesdorff and M. Meyer. The triple helix of university-industry-government relations. Scientometrics, 58(2):191-203, 2003.

A. Merriam. The Anthropology of Music. Northwestern University Press, 1964.

H. Nowotny, P. Scott, and M. Gibbons. Rethinking Science: Knowledge and the Public in an Age of Uncertainty. Polity Press, 2001.

A. Pichevin. Le disque à l'heure d'Internet, l'industrie de la musique et les nouvelles technologies de diffusion. L. Harmattan, 1997.

N. L. Wallin, B. Merker, and S. Brown. The origins of music. MIT Press, 2000.

European Commission. Europe 2010 : A Partnership for European Renewal Prosperity, Solidarity and Security. Strategic Objectives 2005-2009.

KEA European Affairs. The economy of culture in europe., 2006.


State of the Art

The aim of this section is to give an overview of current research trends (we deliberately refrain from trying to summarise the state of the art, as that would go far beyond what can be done here), with a special emphasis on the open issues that wait to be addressed, or are currently being worked on. Faced with the great variety of research topics within SMC, we have tried to give our summary a coherent structure by grouping the topics into three major areas – Sound, Interaction and Music – which are further divided into sub-areas.


The figure depicts the relationships between the different research areas and sub-areas as we see them. We make a basic distinction between research that focuses on sound (left-hand side of the figure) and research that focuses on music (right-hand side of the figure). For each research field, there is an analytic and a synthetic approach. The analytic approach goes from encoded physical (sound) energy to meaning (sense), whereas the synthetic approach goes in the opposite direction, from meaning (sense) to encoded physical (sound) energy. Accordingly, analytic approaches to sound and music pertain to analysis and understanding, whereas synthetic approaches pertain to generation and processing. In between sound and music, there are multi-faceted research fields that focus on interactional aspects. These are performance modelling and control, music interfaces, and sound interaction design.




In this section we review the research on sound that is being carried out within the boundaries identified in the definition of the field. From a sound to sense point of view, we include the analysis, understanding and description of all musical and non-musical sounds except speech. Then, in the sense to sound direction, we include the research that is more related to sound synthesis and processing.

Sound Description and Understanding

One of the basic aims of SMC research is to understand the different facets of sound from a computational point of view, or by using computational means and models. We want to understand and model not only the properties of sound waves but also the mechanisms of their generation, transmission and perception by humans. Even more, we want to understand sound as the basic communication channel for music and a fundamental element in our interaction with the environment. Sound serves as one of the main signals for human communication, and its understanding and description requires a notably multidisciplinary approach.

Traditionally, the main interest of SMC researchers has been musical sounds and thus the understanding of the sound generated by musical instruments and the specific transmission and perception mechanisms involved in the music communication chain. In recent years, this focus has been broadened and there is currently an increased interest in non-musical sounds and aspects of communication beyond music. A number of the methodologies and technologies developed for music are starting to be used for human communication and interaction through sound in general (e.g., ecological sounds) and there is increasing cross-fertilisation between the various sound-related disciplines.

There has been a great deal of research work on the analysis and description of sound by means of signal processing techniques, extracting features at different abstraction levels and developing source-specific and application-dependent technologies. Most of the current research in this domain starts from frequency domain techniques as a step towards developing sound models that might be used for recognition, retrieval, or synthesis applications. Other approaches consider sparse atomic signal representations such as matching pursuit, the analytical counterpart to granular synthesis [Sturm et al. 2006].

Also of importance has been the study of sound-producing physical objects. The aim of such study is to understand the acoustic characteristics of musical instruments and other physical objects which produce sounds relevant to human communication. Its main application has been the development of physical models of these objects for synthesis applications [Smith, 2006; Välimäki, et al., 2006; Rocchesso & Fontana, 2003], so that the user can produce sound by interacting with the models in a physically meaningful way.

However, beyond the physical aspect, sound is a communication channel that carries information. We are therefore interested in identifying and representing this information. Signal processing techniques can only go so far in extracting the meaningful content of a sound. Thus, in the past few years there has been an exponential increase in research activity which aims to generate semantic descriptions automatically from audio signals. Statistical Modelling, Machine Learning, Music Theory and Web Mining technologies have been used to raise the semantic level of sound descriptors. MPEG-7 [Kim et al., 2005] has been created to establish a framework for effective management of multimedia materials, standardising the description of sources, perceptual aspects and other relevant descriptors of a sound or any multimedia asset.

Most research approaches to sound description are essentially bottom-up, starting from the audiosignal and trying to reach the highest possible semantic level. There is a general consensus that this approach has clear limitations and does not allow us to bridge what is known as the ‘semantic gap’– that is, the discrepancy between what can currently be extracted from audio signals and the kinds of high-level, semantically meaningful concepts that human listeners associate with sounds and music. The current trend is towards multimodal processing methods and top-down approaches based on ontologies, reasoning rules, and cognition models. Also, in practical applications (e.g., in web-based digital music services), collaborative tagging by users is being increasingly used to gain semantic information that would be hard or impossible to extract with current computational methods.

Sound Description and Understanding: Key Issues

The above synopsis of research in sound description and understanding has already revealed a number of current limitations and open problems. Below, we present some selected research questions that should addressed, or issues that should be taken into account in future research.

Perceptually informed models of acoustic information processing: There is an active field of research in neuroscience that tries to relate behavioural and physiological observations, by means of computational models. There is a wide variety of approaches in the computational neuroscience field, from models based on accurate simulations of single neurons to systems-based models relying heavily on information theory. SMC has already benefitted in the past from auditory models as signal processing tools. For instance, audio compression schemessuch has MP3 are heavily based on models of perceptual masking. This trend is set to continue as the models become more robust and computationally efficient. In the future, the interaction between auditory models and SMC could also be on a conceptual level. For instance, the sensory-motor theory suggests that the study of sound perception and production should be intimately related.

Sound source recognition and classification: The ability of a normal human listener to recognise objects in the environment from only the sounds they produce is extraordinarily robust. In contrast, computer systems designed to recognise sound sources function precariously, breaking down whenever the target sound is degraded by reverberation, noise, or by competing sounds. Musical signals present a real challenge for existing systems as the three sources of difficulty are almost always present. SMC can thus contribute to the development of sound source recognition systems, by providing well-controlled test situations that retain an ecological value [Elhilali, Shamma, Thorpe and Pressnitzer, 2007]. In return, models of sound source recognition will have obvious applications in current and future application of SMC, such as score following (adding timbre cues to the pitch cues normally used) or music-information retrieval systems.

Sound search and retrieval based on content: Audio content analysis and description enables various new and advanced audiovisual applications and services. Search engines or specific filters could use the extracted description to help users navigate or browse through large collections of audio data. Digital analysis of an audio file may be able to discriminate between speech, music and other entities or identify how many speakers are contained in a speech segment, what gender they are, and even who exactly is speaking. Spoken content may be identified and converted to text. Music might be classified into categories, such as jazz, rock and classical [Tzanetakis & Cook, 2002] (although this is problematic because such categories are user-dependent and perhaps cannot be unequivocally defined). Finally, it may be possible to automatically identify and find particular sounds, such as explosions, gunshots, etc.[Cano, 2007]. For such scenarios to become really useful, the necessary improvements in sound search and retrieval will call for a change of paradigm in the description of sounds – from descriptions constrained to a finite number of crisp labels, towards natural language descriptions, at a higher semantic level, similar to that used by humans. A step in this direction might be the inclusion of reasoning rules and knowledge bases (sound ontologies) encoding common sense knowledge about sound. Another key issue is the combination of information from complementary media, such as video or images.

Sound Synthesis and Processing

Sound synthesis and processing has been the most active research area in SMC for more than 40 years. Quite a number of the research results of the 1960s and 70s are now standard components of many audio and music devices, and new technologies are continuously being developed and integrated into new products [Välimäki et al., 2007]. The sounds of our age are digital. Most of them are generated, processed, and transcoded digitally. Given that these technologies have already become so common and that most recent developments represent only incremental improvements, research in this area has lost some of its prominence in comparison to others in SMC. Nonetheless, there remain a number of open issues to be worked on, and some of the new trends have the potential for huge industrial impact.

With respect to sound synthesis, most of the abstract algorithms that were the focus of work in the1970s and 80s – e.g., FM and waveshaping – were not directly related to a sound source or its perception (though some of the research was informed by knowledge of musical acoustics and source physics). The 1990s saw the emergence of computational modeling approaches to sound synthesis. These aimed either at capturing the characteristics of a sound source, known as physical models [Smith, 2006; Välimäki et al., 2006; Cadoz et al., 1993], or at capturing the perceptual characteristics of the sound signal, generally referred to as spectral or signal models [Serra, 1997]. The technology transfer expectations of the physical models of musical instruments have not been completely fulfilled. Their expressiveness and intuitive control – advantages originally attributed to this kind of model – did not help commercial music products to succeed in the market place. Meanwhile, synthesis techniques based on spectral modelling have met with competitive success in voice synthesisers, both for speech and singing voices [Bonada and Serra, 2007], but to a lesser extent in the synthesis of all other musical instruments. A recent and promising trend is the combination of physical and spectral models, such as physically informed sonic modelling [Cook, 1997] and commuted synthesis [Smith, 2006; Välimäki et al., 2006]. Another recent trend is to simulate traditional analog electronics used in music synthesizers of the 1960s and 1970s [Lane, 1997; Välimäki and Huovilainen, 2006] and in amplifiers used by electric guitar and bass players [Karjalainen et al., 2006; Yeh and Smith, 2006].

As an evolution of granular synthesis techniques (e.g., [Roads, 2001]), new corpus-based concatenative methods for musical sound synthesis, also known as mosaicing, have attracted much attention recently [Schwarz, 2007]. They make use of a variety of sound snippets in a database to assemble a desired sound or phrase according to a target specification given via sound descriptors or by an example sound. With ever-larger sound databases readily available, together with a pertinent description of their contents, these methods are increasingly used for composition, high-level instrument synthesis, interactive exploration of sound corpora, and other applications[Lindeman, 2007].

In sound processing, there are a large number of active research topics. Probably the most well established are audio compression and sound spatialisation, both of which have clear industrial contexts and quite well defined research agendas. Digital audio compression techniques allow the efficient storage and transmission of audio data, offering various degrees of complexity, compressed audio quality and degree of compression. With the widespread uptake of mp3, audio compression technology has spread to mainstream audio and is being incorporated into most sound devices [Mock, 2004]. These recent advances have resulted from an understanding of the human auditory system and the implementation of efficient algorithms in advanced DSP processors. Improvements to the state of the art will not be easy, but there is a trend towards trying to make use of our new understanding of human cognition and of the sound sources to be coded. Sound spatialisation effects attempt to widen the stereo image produced by two loudspeakers or stereo headphones, or to create the illusion of sound sources placed anywhere in three dimensional space, including behind, above or below the listener. Some techniques, such as ambisonics, vector base amplitude panning and wave-field synthesis, are readily available, and new models are being worked on that combine signal-driven bottom-up processing with hypothesis-driven top-down processing [Blauert, 2005]. Auditory models and listening tests currently help us to understand the mechanisms of binaural hearing and exploit them in transcoding and spatialisation. Recent promising examples include the Binaural Cue Coding method (Faller, 2006) and Spatial Impulse Response Rendering (Pulkki and Merimaa, 2006).

Digital sound processing also includes techniques for audio post-production and other creative usesin music and multimedia applications [Zölzer, 2002]. Time and frequency domain techniques have been developed for transforming sounds in different ways. But the current trend is to move from signal processing to content processing; that is, to move towards higher levels of representation for describing and processing audio material. There is a strong trend towards the use of all these signal processing techniques in the general field of interactive sound design. Sound generation techniques have been integrated in various multimedia and entertainment applications (e.g., sound effects and background music for gaming), sound product design (ring tones for mobile phones) and interactive sound generation for virtual reality or other multimodal systems. Old sound synthesis technologies have been brought back to life and adapted to the needs of these new interactive situations. The importance of control has been emphasised, and source-centred and perception-centre modelling approaches have been expanded towards interactive sonification [Hermann & Ritter, 2005].

Sound Synthesis and Processing: Key Issues

Interaction-centred sound modelling: The interactive aspects of music and sound generation should be given greater weight in the design of future sound synthesis techniques. A challenge is how to make controllability and interactivity central design principles in sound modelling. It is widely believed that the main missing element in existing synthesis techniques is adequate control. The extraction of expressive content from human gestures, from haptics (e.g., pressure, impacts or friction-like interactions on tangible interfaces), from movement (motion capture and analysis) or voice (extraction of expressive content from the voice or breath of the performer), should become a focus of new research in sound generation. This will also open the field to multisensory and cross-modal interaction research. The next problem then concerns how to exploit the extracted contents in order to model sound. Effective sound generation needs to achieve a perceptually robust link between gesture and sound. The mapping problem is in this sense crucial both in musical instruments and in any other device/artefact involving sound as one of its interactive elements.

Modular sound generation: Sound synthesis by physical modelling has, so far, mainly focused on accurate reproduction of the behaviour of musical instruments. Some other efforts have been devoted to everyday sounds [Rocchesso et al., 2003; Rocchesso and Fontana, 2004; Peltola et al., 2007] or to the application of sophisticated numerical methods for solving wave propagation problems [Trautmann et al., 2005; Bilbao, 2007]. A classic dream is to be able to build or alter the structure of a musical instrument on the computer and listen to it before it is actually built. By generalizing this thought, the dream changes to the idea of having a toolkit for constructing sounding objects from elementary blocks such as waveguides, resonators and nonlinear functions [Rabenstein et al., 2007]. This goal has
faced a number of intrinsic limitations in block-based descriptions of musical instruments. In general, it is difficult to predict the sonic outcome of an untested connection of blocks. However, by associating macro-blocks to salient phenomena, it should be possible to devise a constructivist approach to sound modelling. At the lowest level, blocks should correspond to fundamental interactions (impact, friction, air flow on edge, etc.). The sound quality of these blocks should be tunable, based on properties of both the interaction (e.g., pressure, force) and the interactants (e.g., size and material of resonating object). Higher-level, articulated phenomena should be modelled on top of lower-level blocks according to characteristic dynamic evolutions (e.g., bouncing, breaking). This higher level of sound modelling is suitable for tight coupling with emerging computer animation and haptic rendering techniques, as its time scale is compatible with the scale of visual motion and gestural/tactile manipulation. In this way, sound synthesis can become part of a moregeneral constructivist, physics-based approach to multisensory interaction and display.

Physical modelling based on data analysis: To date, physical models of sound and voice have been appreciated for their desirable properties in terms of synthesis, control and expressiveness. However, it is also widely recognised that they are very difficult to fit onto real observed data due to the high number of parameters involved, the fact that control parameters are not related to the produced sound signal in an intuitive way and, in some cases, the radical non-linearities in the numerical schemes. All these issues make the parametric identification of physics-based models a formidable problem. Future research in physical voice and sound modelling should thus take into account the importance of models fitting real data, in terms of both system structure design and parametric identification. Co-design of numerical structures and identification procedures may also be a possible path to complexity reduction. It is also desirable that from the audio-based physical modelling paradigm, new model structures emerge which will be general enough to capture the main sound features of broad families of sounds (e.g. sustained tones from wind and string instruments, percussive sounds) and to be trained to reproduce the peculiarities of a given instrument from recorded data.

Audio content processing: Currently, a very active field of research is Auditory Scene analysis [Bregman, 1990], which is conducted both from perceptual and computational points of view. This research is conducted mostly within the cognitive neurosciences community. But a multidisciplinary approach would allow the translation of its fundamental research advances to many practical applications. For instance, as soon as robust results emerge from this field, it will be possible to approach (re)synthesis from a higher-level sound-object perspective, permitting us to identify, isolate, transform and recombine sound-objects in a flexible way. Sound synthesis and manipulation using spectral models is based on features emerging from audio analysis. The use of auditory scene representations for sound manipulation and synthesis could be based on sound objects captured from the analysis. This possibility offers great prospects for music, sound and media production. With the current work on audio content analysis, we can start identifying and processing higher-level elements in an audio signal. For example, by identifying the rhythm of a song, a time- stretching technique can become a rhythm-changing system, and by identifying chords, a pitch shifter might be able to transpose the key of the song.


J. Blauert. Communication Acoustics (Signals and Communication Technology). Springer, Berlin, Germany, July 2005.

P. R. Cook. Physically informed sonic modeling (PhISM): Synthesis of percussive sounds. Computer Music J., 21(3):38-49, 1997.

X. Serra. Musical sound modeling with sinusoids plus noise. In C. Roads, S. Pope, A. Piccialli, and G. De Poli, editors, Musical Signal Processing, pages 91-122. Swets & Zeitlinger Publishers, Lisse, the Netherlands, 1997.

J. O. Smith. Physical audio signal processing: for virtual musical instruments and digital audio effects., 2006.

V. Välimäki, J. Pakarinen, C. Erkut, and M. Karjalainen. Discrete-time modelling of musical instruments. Rep. Prog. Phys., 69(1), January 2006.

T Hermann and H. Ritter. Model-based sonification revisited - Authors' comments on Hermann and Ritter, ICAD 2002. ACM Trans. Appl. Percept., 2(4):559-563, October 2005.

U. Zölzer, editor. DAFX:Digital Audio Effects. John Wiley & Sons, May 2002.

Albert S. Bregman. Auditory Scene Analysis: The Perceptual Organization of Sound. The MIT Press, September 1990.

M.S. Gazzaniga. The New Cognitive Neurosciences. MIT Press, Cambridge, Mass., 2000.

D. Rocchesso and F. Fontana, editors. The Sounding Object. Edizioni di Mondo Estremo, 2003.

H.-G. Kim, N. Moreau, and T. Sikora. MPEG-7 Audio and Beyond: Audio Content Indexing and Retrieval. Wiley & Sons, 2005.

K. D. Martin. Sound-source recognition: A theory and computational model. PhD thesis, MIT, 1999.

P. Cano. Content-based Audio Search: From Fingerprinting to Semantic Audio Retrieval. PhD thesis, Pompeu Fabra University, 2007.

D. Schwarz. Corpus-Based Concatenative Synthesis. IEEE Signal Processing Magazine, 24(2):92-104, 2007.

C. Cadoz, A. Luciani, and J.-L. Florens. "CORDIS-ANIMA : a Modeling and Simulation System for Sound and Image Synthesis- The General Formalism". Computer Music Journal, 17(1):19-29, 1993


In this section we review a variety of research issues that address interaction with sound and music. Three main topics are considered: Music Interfaces, Performance Modelling and Control, and Sound Interaction Design. Music Interfaces is quite a well-established topic which deals with the design of controllers for music performance. Performance Modelling and Control is an area that has been quite active in the last decade. It has focused on the study of the performance of classical music but more recently is opening up to new challenges. The last topic covered under the Interaction heading is Sound Interaction Design. This is a brand new area that opens up many new research problems not previously addressed within the SMC research community.

Music Interfaces

Digital technologies have revolutionised the development of new musical instruments, not only because of the sound generation possibilities of the digital systems, but also because the concept of ‘musical instrument’ has changed with the use of these technologies. In most acoustic instruments, the separation between the control interface and the sound-generating subsystems is fuzzy and unclear. In the new digital instruments, the gesture controller (or input device) that takes the control information from the performer(s) is always separate from the sound generator. For exact and repeatable control of a synthesizer, or a piece of music, a computer-based notation program gives a stable environment (see, e.g., [Kuuskankare & Laurson, 2006]). For real-time control the controlling component can be a simple computer mouse, a computer keyboard or a MIDI keyboard, but with the use of sensors and appropriate analogue-to-digital converters, any signal coming from the outside can be converted into control messages intelligible to the digital system. A recent example is music interfaces enabling control through expressive full-body movement and gesture [Camurri et al., 2005]. The broad accessibility of devices, such as video cameras and analog-to- MIDI interfaces, provides a straightforward means for the computer to access sensor data. The elimination of the physical dependencies has meant that all previous construction constraints in the design of digital instruments have been relaxed [Jordà, 2005].

A computer-augmented instrument takes an existing instrument as its base and uses sensors and other instrumentation to pick up as much information as possible from the performer’s motions. The computer uses both the original sound of the instrument and the feedback from the sensor array to create and/or modify new sounds. Augmented instruments are often called hyper-instruments after the work done at MIT's Media Lab [Paradiso, 1997], which aimed at providing virtuoso performers with controllable means of amplifying their gestures, suggesting coherent extensions to instrumental playing techniques.

One of the new paradigms of digital instruments is the idea of collaborative performance and of instruments that can be performed by multiple players. In this type of instrument, performers can take an active role in determining and influencing not only their own musical output but also that of their collaborators. These music collaborations can be achieved over networks such as the Internet, and the study of network or distributed musical systems is a new topic on which much research is being carried out [Barbosa, 2006].

Most current electronic music is being created and performed with laptops, turntables and controllers that were not really designed to be used as music interfaces. The mouse has become the most common music interface, and several of the more radical and innovative approaches to real- time performance are currently found in the apparently more conservative area of screen-based and mouse-controlled software interfaces. Graphical interfaces may be historically freer and better suited to unveiling concurrent, complex and unrelated musical processes. Moreover, interest in gestural interaction with sound and music content and in gestural control of digital music instruments is emerging as part of a more general trend towards research on gesture analysis, processing and synthesis. This growing importance is demonstrated by the fact that the Gesture Workshop series of conferences recently included sessions on gesture in music and the performing arts. Research on gesture not only enables a deeper investigation of the mechanisms of human- human communication, but may also open up unexplored frontiers in the design of a novel generation of multimodal interactive (music) systems.

A recent trend around new music interfaces and digital instruments is that they are more and more designed for interaction with non-professional users. The concepts of active experience and active listening are emerging, referring to the opportunity for beginners, naïve and inexperienced users, in a collaborative framework, to interactively operate on music content, by modifying and moulding it in real-time while listening. The integration of research on active listening, context-awareness, gestural control is leading to new creative forms of interactive music experience in context-aware (mobile) scenarios, resulting in an embodiment and control of music content by user behaviour, e.g., gestures and actions (for a recent example see [Camurri et al., 2007]).

Music Interfaces: Key Issues

Design of innovative multimodal music interfaces: A key target for designers of future interactive music systems is to endow them with natural, intelligent and adaptive multimodal interfaces which exploit the ease and naturalness of ordinary physical gestures in everyday contexts and actions. Examples are tangible interfaces (e.g., [Ishii & Ulmer, 1997]) and their technological realization as Tangible Acoustic Interfaces (TAIs), which exploit the propagation of sound in physical objects in order to locate touching positions. TAIs are a very promising interface for future interactive music systems. They have recently been enhanced with algorithms for multimodal high-level analysis of touching gestures so that information can be obtained about how the interface is touched (e.g., forcefully or gently). Despite such progress, currently available multimodal interfaces still need improvements. A key issue is to develop interfaces that can grab subtler high-level information. For example, research has been devoted to multimodal analysis of basic emotions (e.g., happiness, fear, sadness, anger), but we are still far from modelling more complex phenomena such as engagement, empathy, entrainment. Moreover, current multimodal interfaces usually are not context-aware, i.e., they analyse users’ gestures and their expressiveness, but they do not take into account the context in which the gestures are performed. Another key issue is related to scalability. Current multimodal interfaces often require special purpose set-ups including positioning of video cameras and careful preparation of objects e.g., for TAIs. Such systems are often not scalable and difficult to port in the home and in the personal environment. A major research challenge is to exploit future mobile devices, the sensors they will be endowed with, and their significantly increased computational power and wireless communication abilities.

Integration of control with sound generation: The separation between gesture controllers and output generators has some significant negative consequences, the most obvious being the reduction of the ‘feel’ associated with producing a certain kind of sound. Another frequent criticism is the inherent limitations of MIDI, the protocol that connects these two components of the instrument chain. A serious attempt to overcome these limitations is provided by the UDP-based Open Sound Control (OSC) protocol [Wright, 2005]. However, there is a more basic drawback concerning the conceptual and practical separation of new digital instruments into two separated components: it becomes hard – or even impossible – to design highly sophisticated control interfaces without a profound prior knowledge of how the sound or music generators will work. Generic, non-specific music controllers tend to be either too simple, mimetic (imitating traditional instruments), or too technologically biased. They can be inventive and adventurous, but their coherence cannot be guaranteed if they cannot anticipate what they are going to control [Jordà, 2005].

Feedback systems: When musicians play instruments, they perform certain actions with the expectation of achieving a certain result. As they play, they monitor the behaviour of their instrument and, if the sound is not quite what they expect, they will adjust their actions to change it. In other words, they have effectively become part of a control loop, constantly monitoring the output from their instrument and subtly adjusting bow pressure, breath pressure or whatever control parameter is appropriate. The challenge is to provide the performer of a digital instrument with the appropriate feedback to control the input parameters better than that provided by mere auditory feedback. One proposed solution is to make use of the musician’s existing sensitivity to the relationship between an instrument’s ‘feel’ and its sound with both haptic and auditory feedback [O’Modhrain, 2000]. Other solutions may rely on visual and auditory feedback [Jordà, 2005].

Designing effective interaction metaphors: Beyond the two previous issues, which concern the musical instrument paradigm, the design of structured and dynamic interaction metaphors, enabling users to exploit sophisticated gestural interfaces, has the potential to lead to a variety of music and multimedia applications beyond the musical instrument metaphor. The state-of-the-art practice mainly consists of direct and strictly causal gesture/sound associations, without any dynamics or evolutionary behaviour. However, research is now shifting toward higher-level indirect strategies [Visell and Cooperstock, 2007]: these include reasoning and decision-making modules related to rational and cognitive processes, but they also take into account perceptual and emotional aspects. Music theory and artistic research in general can feed SMC research with further crucial issues. An interesting aspect, for instance, is the question of expressive autonomy [Camurri et al., 2000], that is, the degree of freedom an artist leaves to a performance involving an interactive music system.

Improving the acceptance of new interfaces: The possibilities offered by digital instruments and controllers are indeed endless. Almost anything can be done and much experimentation is going on. Yet the fact is that there are not that many professional musicians who use them as their main instrument. No recent electronic instrument has reached the (limited) popularity of the Theremin or the Ondes Martenot, invented in 1920 and 1928, respectively.1 Successful new instruments exist, but they are not digital, not even electronic. The most recent successful instrument is the turntable, which became a real instrument in the early eighties when it started to be played in a radically unorthodox and unexpected manner. It has since then developed its own musical culture, techniques and virtuosi. For the success of new digital instruments, the continued study of sound control, mapping, ergonomics, interface design and related matters is vital. But beyond that, what is required is integral studies that consider not only ergonomic but also psychological, social and, above all, musical issues.

Performance Modelling and Control

A central activity in music is performance, that is, the act of interpreting, structuring, and physically realising a work of music by playing a musical instrument. In many kinds of music – particularly so in Western art music – the performing musician acts as a kind of mediator: a mediator between musical idea and instrumental realisation, between written score and musical sound, between composer and listener/audience. Music performance is a complex activity involving physical, acoustic, physiological, psychological, social and artistic issues. At the same time, it is also a deeply human activity, relating to emotional as well as cognitive and artistic categories.

Understanding the emotional, cognitive and also (bio-)mechanical mechanisms and constraints governing this complex human activity is a prerequisite for the design of meaningful and useful music interfaces (see above) or more general interfaces for interaction with expressive media such as sound (see next section). Research in this field ranges from studies aimed at understanding expressive performance to attempts at modelling aspects of performance in a formal, quantitative and predictive way.

Quantitative, empirical research on expressive music performance dates all the way back to the 1930s, to the pioneering work by Seashore and colleagues in the U.S. After a period of neglect, the topic experienced a veritable renaissance in the 1970s, and music performance research is now thriving and highly productive (a comprehensive overview can be found in [Gabrielsson, 2003]). Historically, research in (expressive) music performance has focused on finding general principles underlying the types of expressive ‘deviations’ from the musical score (e.g., in terms of timing, dynamics and phrasing) that are a hallmark of expressive interpretation. Three different research strategies can be discerned (see [De Poli, 2004; Widmer & Goebl, 2004] for recent overviews on expressive performance modelling): (1) acoustic and statistical analysis of performances by real musicians – the so-called analysis-by-measurement method; (2) making use of interviews with expert musicians to help translate their expertise into performance rules – the so-called analysis-by- synthesis method; and (3) inductive machine learning techniques applied to large databases of performances.

Studies along these lines by a number of research teams around the world have shown that there are significant regularities that can be uncovered in these ways, and computational models of expressive performance (of mostly classical music) have proved to be capable of producing truly musical results These achievements are currently inspiring a great deal of research into more comprehensive computational models of music performance and also ambitious application scenarios.

One such new trend is quantitative studies into the individual style of famous musicians. Such studies are difficult because the same professional musician can perform the same score in very different ways (cf. commercial recordings by Vladimir Horowitz and Glenn Gould). Recently, new methods have been developed for the recognition of music performers and their style, among them the fitting of performance parameters in rule-based performance models and the application of machine learning methods for the identification of the performance style of musicians. Recent results of specialised experiments show surprising artist recognition rates (e.g., [Saunders et al., 2004]).

So far, music performance research has been mainly concerned with describing detailed performance variations in relation to musical structure. However, there has recently been a shift towards high-level musical descriptors for characterising and controlling music performance, especially with respect to emotional characteristics. For example, it has been shown that it is possible to generate different emotional expressions of the same score by manipulating rule parameters in systems for automatic music performance [Bresin & Friberg, 2000].

Interactive control of musical expressivity is traditionally the task of the conductor. Several attempts have been made to control the tempo and dynamics of a computer-played score with some kind of gesture input device. For example, Friberg [2006] describes a method for interactively controlling, in real time, a system of performance rules that contain models for phrasing, micro- level timing, articulation and intonation. With such systems, high-level expressive control can be achieved. Dynamically controlled music in computer games is another important future application.

Visualisation of musical expressivity, though perhaps an unusual idea, also has a number of useful applications. In recent years, a number of efforts have been made in the direction of new display forms of expressive aspects of music performance. Langner and Goebl [2003] have developed a method for visualising an expressive performance in a tempo-loudness space: expressive deviations leave a trace on the computer screen in the same way as a worm does when it wriggles over sand, producing a sort of ‘fingerprint’ of the performance. This and other recent methods of visualisation can be used for the development of new multi-modal interfaces for expressive communication, in which expressivity embedded in audio is converted into visual representation, facilitating new applications in music research, music education and HCI, as well as in artistic contexts. A visual display of expressive audio may also be desirable in environments where audio display is difficult or must be avoided, or in applications for hearing-impaired people.

For many years, research in Human-Computer Interaction in general and in sound and music computing in particular was devoted to the investigation of mainly ‘rational’, abstract aspects. In the last ten years, however, a great number of studies have emerged which focus on emotional processes and social interaction in situated or ecological environments. Examples are the research on Affective Computing at MIT [Picard, 1997] and research on KANSEI Information Processing in Japan [Hashimoto, 1997]. The broad concept of ‘expressive gesture’, including music, human movement and visual (e.g., computer animated) gesture, is the object of much contemporary research.

Performance Modelling and Control: Key Issues

A deeper understanding of music performance: Despite some successes in computational performance modelling, current models are extremely limited and simplistic vis-a-vis the complex phenomenon of musical expression. It remains an intellectual and scientific challenge to probe the limits of formal modelling and rational characterisation. Clearly, it is strictly impossible to arrive at complete predictive models of such complex human phenomena. Nevertheless, work towards this goal can advance our understanding and appreciation of the complexity of artistic behaviours. Understanding music performance will require a combination of approaches and disciplines – musicology, AI and machine learning, psychology and cognitive science.

For cognitive neuroscience, discovering the mechanisms that govern the understanding of music performance is a first-class problem. Different brain areas are involved in the recognition of different performance features. Knowledge of these can be an important aid to formal modelling and rational characterisation of higher order processing, such as the perceptual differentiation between human-like and mechanical performances. Since music making and appreciation is found in all cultures, the results could be extended to the formalisation of more general cognitive principles.

Computational models for artistic music performance: The use of computational music performance models in artistic contexts (e.g., interactive performances) raises a number of issues that have so far only partially been faced. The concept of a creative activity being predictable and the notion of a direct ‘quasi-causal’ relation between the musical score and a performance are both problematic. The unpredictable intentionality of the artist and the expectations and reactions of listeners are neglected in current music performance models. Surprise and unpredictability are crucial aspects in an active experience such as a live performance. Models considering such aspects should take account of variables such as performance context, artistic intentions, personal experiences and listeners’ expectations.

Music interaction models in multimedia applications: There will be an increasing number of products which embed possibilities for interaction and expression in the rendering, manipulation and creation of music. In current multimedia products, graphical and musical objects are mainly used to enrich textual and visual information. Most commonly, developers focus more on the visual rather than the musical component, the latter being used merely as a realistic complement or comment to text and graphics. Improvements in the human-machine interaction field have largely been matched by improvements in the visual component, while the paradigm of the use of music has not changed adequately. The integration of music interaction models in the multimedia context requires further investigation, so that we can understand how users can interact with music in relation to other media. Two particular research issues that need to be addressed are models for the analysis and recognition of users’ expressive gestures, and the communication of expressive content through one or more non-verbal communication channels mixed together.

Sound Interaction Design

Sound-based interactive systems can be considered from several points of view and several perspectives: content creators, producers, providers and consumers of various kinds, all in a variety of contexts. Sound is becoming more and more important in interaction design, in multimodal interactive systems, in novel multimedia technologies which allow broad, scalable and customised delivery and consumption of active content. In these scenarios, some relevant trends are emerging that are likely to have a deep impact on sound related scientific and technological research in the coming years. Thanks to research in Auditory Display, Interactive Sonification and Soundscape Design, sound is becoming an increasingly important part of Interaction Design and Human- Computer Interaction.

Auditory Display is a field that has already reached some kind of consolidated state. A strong community in this field has been operating for more than twenty years (see Auditory Display and Sonification are about giving audible representation to information, events and processes. Sound design for conveying information is, thus, a crucial issue in the field of Auditory Display. The main task of the sound designer is to find an effective mapping between the data and the auditory objects that are supposed to represent them in a way that is perceptually and cognitively meaningful. Auditory warnings are perhaps the only kind of auditory displays that have been thoroughly studied and for which solid guidelines and best design practices have been formulated. A milestone publication summarising the multifaceted contributions to this sub-discipline is the book edited by Stanton and Edworthy [1999].

If Sonification is the use of non-speech audio to perceptualize information, Interactive Sonification is a more recent specialization that takes advantage of the increasing diffusion of sensing and actuating technologies. The listener is actively involved in a perception/action loop, and the main objective is to generate a sonic feedback which is coherent with physical interactions performed with sonically-augmented artifacts. This allows active exploration of information spaces and more engaging experiences. A promising approach is Model Based Sonification [Hermann & Ritter, 2005] which uses sound modelling techniques in such a way that sound emerges as an organic product of interactions among modelling blocks and external agents. Often, interaction and sound feedback are enabled by physically-based models. For example, the user controls the inclination of a stick, and a virtual ball rolls over it producing a sound that reveals the surface roughness and situations of equilibrium [Rath & Rocchesso, 2005]. While building these interactive objects for sonification, it is soon realized that fidelity to the physical phenomena is not necessarily desirable.

Sound models are often more effective if they are "minimal yet veridical" [Rocchesso et al., 2003], or if they exaggerate some traits as it is done by cartoonists.

A third emerging area of research with strong implications for social life, whose importance is astonishingly underestimated, is that of sound in the environment – on different scales, from architectonic spaces to urban contexts and even to truly geographical dimensions. Soundscape Design as the auditory counterpart of landscape design is the discipline that studies sound in its environmental context, from both naturalistic and cultural viewpoints. It is going to become more and more important in the context of the acoustically saturated scenarios of our everyday life. Concepts such as ‘clear hearing’ and hi-fi versus lo-fi soundscapes, introduced by Murray Schafer [1994], are becoming crucial as ways of tackling the ‘composition’ of our acoustic environment in terms of appropriate sound design.

Sound Interaction Design: Key Issues

Evaluation methodologies for sound design: Before Sound Interaction Design, there is Sound Design. And it is worth asking whether this latter is a mature discipline in the sense that design itself is. Is there anybody designing sounds with the same attitude that Philippe Starck designs a lemon squeezer? What kind of instruments do we have at our disposal for the objective evaluation of the quality and the effectiveness of sound products in the context, for example, of industrial design? As a particular case, sound product design is rapidly acquiring a more and more relevant place in the loop of product implementation and evaluation. Various definitions of sound quality have been proposed and different evaluation parameters have been put forward for deriving quantitative predictions from sound signals [Lyon, 2000]. The most commonly used parameters (among others) are loudness, sharpness, roughness and fluctuation strength. Loudness is often found to be the dominant measurable factor that adversely affects sound quality. However, more effective and refined measurement tools for defining and evaluating the aesthetic contents and the functionality of a sound have not yet been devised. The development of appropriate methodologies of this kind is an urgent task for the growth of Sound Design as a mature discipline.

Everyday listening and interactive systems: In the field of Human Computer interaction, auditory icons have been defined as ‘natural’ audio messages that convey information and feedback about events in an intuitive way. The concepts of auditory icons and ‘Everyday Listening’, as opposed to ‘Musical Listening’, were introduced by William Gaver [1994]. The notion of auditory icons is situated within a more general philosophy of an ecological approach to perception. The concept of auditory icons is to use natural and everyday sounds to represent actions and sounds within an interface. In this context, a relevant consideration emerges: a lot of research effort has been devoted to the study of musical perception, while our auditory system is first of all a tool for interacting with the outer world in everyday life. When we consciously listen to or more or less unconsciously hear ‘something’ in our daily experience, we do not really perceive and recognise sounds but rather events and sound sources. Both from a perceptual point of view (sound to sense) and from a modelling/generation point of view (sense to sound), a great effort is still required to achieve the ability to use sound in artificial environments in the same way that we use sound feedback to interact with our everyday environment.

Sonification as art, science, and practice: Sonification, in its very generic sense of information representation by means of sound, is still an open research field. Although a lot of work has been done, clear strategies and examples of how to design sound in order to convey information in an optimal way have only partially emerged. Sonification remains an open issue which involves communication theory, sound design, cognitive psychology, psychoacoustics and possibly other disciplines. A specific question that naturally emerges is whether the expertise of composers, who are accustomed to organising sound in time and polyphonic density, could be helpful in developing more ‘pleasant’ (and thus effective) auditory display design. Would it be possible to define the practice of sonification in terms that are informed by the practice of musical composition? Or, more generally, is an art-technology collaboration a positive, and perhaps vital, element in the successful design of auditory displays? Another inescapable issue is the active use of auditory displays. Sonification is especially effective with all those kinds of information that have a strong temporal basis, and it is also natural to expect that the active involvement of the receiver may lead to better understanding, discoveries and aesthetic involvement. In interactive sonification, the user may play the role of the performer in music production. In this sense, the interpreter of a precisely prescribed music score, adding expressive nuances, or the jazz improviser jiggling here and there within a harmonic sieve could be two good metaphors for an interactive sonification process.

Sound and multimodality: Recently, Auditory Display and Sonification research has also entered the field of multimodal and multi-sensory interaction, exploiting the fact that synchronisation with other sensory channels (e.g., visual, tactile) provides improved feedback. An effective research approach to the kinds of problems that this enterprise brings up is the study of sensorial substitutions. For example, a number of sensory illusions can be used to ‘fool’ the user via cross-modal interaction. This is possible because everyday experience is intrinsically multimodal and properties such as stiffness, weight, texture, curvature and material are usually determined via cues coming from more than one channel. Soundscape Design: A soundscape is not an accidental by-product of a society. On the contrary, it is a construction, a more or less conscious ‘composition’ of the acoustic environment in which we live. Hearing is an intimate sense similar to touch: the acoustic waves are a mechanical phenomenon and they ‘touch’ our hearing apparatus. Unlike eyes, the ears do not have lids. It is thus a delicate and extremely important task to take care of the sounds that form the soundscape of our daily life. However, the importance of the soundscape remains generally unrecognised and a process of education which would lead to more widespread awareness is urgently needed.


A. Gabrielsson. Music Performance Research at the Millennium. Psychology of Music, 31(3):221-272, 2003.

Gerhard Widmer and Werner Goebl. Computational Models of Expressive Music Performance:The State of the Art. Journal of New Music Research, 33(3):203-216, 2004.

C. Saunders, D. Hardoon, J. Shawe-Taylor, and G. Widmer. Using String Kernels to Identify Famous Performers from their Playing Style. In Proceedings of the 15th European Conference on Machine Learning (ECML'2004), Pisa, Italy, 2004.

William W. Gaver. Auditory Display: Sonification, Audification and Auditory Interfaces, chapter Using and Creating Auditory Icons, pages 417-446. Addison Wesley, 1994.

Neville A. Stanton and Judy Edworthy. Human Factors in Auditory Warnings. Ashgate, Aldershot, UK, 1999.

M. Schafer. Soundscape - Our Sonic Environment and the Tuning of the World. Destiny Books, Rochester, Vermont., 1994.

M. Rath and D. Rocchesso. Continuous sonic feedback from a rolling ball. IEEE Multimedia, 12(2):60-69, 2005.

D. Rocchesso, R. Bresin, and Fernströmi M. Sounding Objects. IEEE Multimedia, pages 42-52, 2003.

T. Hermann and H. Ritter. Model-Based Sonification Revisited -Authors- Comments on Hermann and Ritter, ICAD 2002. ACM Transactions on Applied Perception, 4(2):559-563, October 2005.

R. Bresin and A. Friberg. Emotional Coloring of Computer-Controlled Music Performances. Computer Music Journal, 24(4):44-63, 2000.

G. De Poli. Methodologies for expressiveness modeling of and for music performance. Journal of New Music Research, 33(3):189-202, 2004.

A. Friberg. pDM: an expressive sequencer with real-time control of the KTH music performance rules. Computer Music Journal, 30(1):37-48, 2006.

J. Langner and W. Goebl. Visualizing expressive performance in tempo-loudness space. Computer Music Journal, 27(4):69-83, 2003.

H. Ishii and B. Ullmer. Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms. In Proceedings of CHI '97, pages 22-27, March 1997.

Sergi Jordà. Digital Lutherie: Crafting musical computers for new musics performance and improvisation. PhD thesis, Pompeu Fabra University, Barcelona, 2005.

J. A. Paradiso. Electronic Music: New ways to play. IEEE Spectrum, 34(12):18-30, 1997.

Alvaro Barbosa. Computer-Supported Cooperative Work for Music Applications. PhD thesis, Pompeu Fabra University, Barcelona, 2006.

M. S. O'Modhrain. Playing by Feel: Incorporating Haptic Feedback into Computer-Based musical Instruments. PhD thesis, Stanford University, 2000.

A. Camurri, P. Coletta, M. Ricchetti, and G Volpe. Expressiveness and physicality in interaction. Journal of New Music Research, 29(3), September 2000.


This section reviews research aimed at understanding, describing and generating music. This area includes several very difficult problems which are a long way from being solved and will definitely require multidisciplinary approaches. All the disciplines involved in SMC have something to say here. Humanities and engineering approaches are required and scientific and artistic methodologies are also needed.

Music Description and Understanding

Music is central to all human societies. Moreover, there is an increasing belief that interaction with musical environments and the use of music as a very expressive medium for communication helped the evolution of cognitive abilities specific to humans [Zatorre, 2005]. Despite the ubiquity of music in our lives, we still do not fully understand, and cannot completely describe, the musical communication chain that goes from the generation of physical energy (sound) to the formation of meaningful entities in our minds via the physiology of the auditory system.

An understanding of what music is and how it functions is of more than just academic interest. In our society, music is a commercial commodity and a social phenomenon. Understanding how music is perceived, experienced, categorised and enjoyed by people would be of great practical importance in many contexts. Equally useful would be computers that can ‘understand’ (perceive, categorise, rate, etc.) music in ways similar to humans.

In the widest sense, then, the basic goal of SMC in this context is to develop veridical and effective computational models of the whole music understanding chain, from sound and structure perception to the kinds of high-level concepts that humans associate with music – in short, models that relate the physical substrate of music (the sound) to mental concepts invoked by music in people (the ‘sense’). In this pursuit, SMC draws on research results from many diverse fields which are related either to the sound itself (physics, acoustics), to human perception and cognition (psycho-acoustics, empirical psychology, cognitive science), or to the technical/algorithmic foundations of computational modelling (signal processing, pattern recognition, computer science, Artificial Intelligence). Neurophysiology and the brain sciences are also displaying increasing interest in music [Zatorre, 2005], as part of their attempts to identify the brain modules involved in the perception of musical stimuli, and the coordination between them.

With respect to computational models, we currently have a relatively good understanding of the automatic identification of common aspects of musical structure (beat, rhythm, harmony, melody and segment structure) at the symbolic level (i.e., when the input to be analysed is musical scores or atomic notes) [Temperley, 2004]. Research is now increasingly focusing on how musically relevant structures are identified directly from the audio signal. This research on musically relevant audio descriptors is driven mainly by the new application field of Music Information Retrieval (MIR) [Orio, 2006]. Currently available methods fall short as veridical models of music perception (even of isolated structural dimensions), but they are already proving useful in practical applications (e.g., music recommendation systems).

In contrast to these bottom-up and reductionist approaches to music perception modelling, we can also observe renewed interest in more ‘holistic’ views of music perception which stress the importance of considering music as a whole instead of the sum of simple structural features (see, e.g., [Serafine, 1988], who argues that purely structural features, such as rhythm or harmony, may have their roots in music theory rather than in any psychological reality). Current research also tries to understand music perception and action not as abstract capacities, but as ‘embodied’ phenomena that happen in, and can only be explained with reference to, the human body [Leman, 2008].

Generally, many researchers feel that music understanding should address higher levels of musical description related, for example, to kinaesthetic/synaesthetic and emotive/affective aspects. A full understanding of music would also have to include the subjective and cultural contexts of music perception, which means going beyond an individual piece of music and describing it through its relation to other music and even extra-musical contexts (e.g., personal, social, political and economic). Clearly, computational models at that level of comprehensiveness are still far in the future.

Music Description and Understanding: Key Issues

‘Narrow’ SMC vs. multidisciplinarity research: s noted above, many different disciplines are accumulating knowledge about aspects of music perception and understanding, at different levels (physics, signal, structure, ‘meaning’), from different angles (abstract, physiological, cognitive, social), and often with different terminologies
and goals. For computational models to truly capture and reproduce human-level music understanding in all (or many) of its facets, SMC researchers will have to learn to acquaint themselves with this very diverse literature (more so than they currently do) and actively seek alliances with scholars from these other fields – in particular from the humanities, which often seem far distant from the technology-oriented field of SMC.

Reductionist vs. multi-dimensional models: Quantitative-analytical research like SMC tends to be essentially reductionist, cutting up a phenomenon into individual parts and dimensions, and studying these more or less in isolation. In SMC-type music perception modelling, that manifests itself in isolated computational models of,
for example, rhythm parsing, melody identification and harmony extraction, with rather severe limitations. This approach neglects, and fails to take advantage of, the interactions between different musical dimensions (e.g., the relation between sound and timbre, rhythm, melody, harmony, harmonic rhythm and perceived segment structure). It is likely that a ‘quantum leap’ in computational music perception will only be possible if SMC research manages to transcend this approach and move towards multi-dimensional models which at least begin to address the complex interplay of the many facets of music.

Bottom-up vs. top-down modelling: There is still a wide gap between what can currently be recognised and extracted from music audio signals and the kinds of high-level, semantically meaningful concepts that human listeners (with or without musical training or knowledge of theoretical music vocabulary) associate with music. Current attempts at narrowing this ‘semantic gap’ via, for example, machine learning, are producing sobering results. One of the fundamental reasons for this lack of progress seems to be the more or less strict bottom-up approach currently being taken, in which features are extracted from audio signals and ever higher-level features or labels are then computed by analysing and aggregating these features. This may be sufficient for associating broad labels like genre to pieces of music (as, e.g., in [Tzanetakis & Cook, 2002]), but already fails when it comes to correctly interpreting the high-level structure of a piece, and definitely falls short as an adequate model of higher-level cognitive music processing. This inadequacy is increasingly being recognised by SMC researchers, and the coming years are likely to see an increasing trend towards the integration of high-level expectation (e.g., [Huron, 2006]) and (musical) knowledge in music perception models. This, in turn, may constitute a fruitful opportunity for musicologists, psychologists and others to enter the SMC arena and contribute their valuable knowledge.

Understanding the music signal vs. understanding music in its full complexity: Related to the previous issue is the observation that music perception takes place in a rich context. ‘Making sense of’ music is much more than decoding and parsing an incoming stream of sound waves into higher-level objects such as onsets, notes, melodies and harmonies. Music is embedded in a rich web of cultural, historical, commercial and social contexts that influence how it is interpreted and categorised. That is, many qualities or categorisations attributed to a piece by listeners cannot solely be explained by the content of the audio signal itself. It is thus clear that high-quality automatic music description and understanding can only be achieved by also taking into account information sources that are external to the music. Current research in Music Information Retrieval is taking first cautious steps in that direction by trying to use the Internet as a source of ‘social’ information about music (‘community meta-data’). Much more thorough research into studying and modelling these contextual aspects is to be expected. Again, this will lead to intensified and larger scale cooperation between SMC proper and the human and social sciences.

Music Generation Modelling

Due to its symbolic nature – close to the natural computation mechanisms available on digital computers – music generation was among the earliest tasks assigned to a computer, possibly pre- dating any sound generation attempts (which are related to signal processing). The first well-known work generated by a computer, Lejaren Hiller's Illiac Suite for string quartet, was created by the author (with the help of Leonard Isaacson) in 1955-56 and premiered in 1957. At the time, digital sound generation was no more than embryonic (and for that matter, analog sound generation was very much in its infancy, too). Since these pioneering experiences, the computer science research field of Artificial Intelligence has been particularly active in investigating the mechanisms of music creation.

Soon after its early beginnings, Music Generation Modelling split into two major research directions, embracing compositional research on one side and musicological research on the other. While related to each other, these two sub-domains pursue fundamentally different goals. In more recent times, the importance of a third direction, mathematical research on music creation modelling, has grown considerably, perhaps providing the necessary tools and techniques to fill in the gap between the above disciplines.

Music generation modelling has enjoyed a wide variety of results of very different kinds in the compositional domain. These results obviously include art music, but they certainly do not confine themselves to that realm. Research has included algorithmic improvisation, installations and even algorithmic Muzak creation. Algorithmic composition applications can be divided into three broad modelling categories: modelling traditional compositional structures, modelling new compositional procedures, and selecting algorithms from extra-musical disciplines [Supper, 2001]. Some strategies of this last type have been used very proficiently by composers to create specific works. These algorithms are generally related to self-similarity (a characteristic that is closely related to that of ‘thematic development’, which seems to be central to many types of music) and they range from genetic algorithms to fractal systems, from cellular automata to swarm models and coevolution. In this same category, a persistent trend towards using biological data to generate compositional structures has developed since the 1960's. Using brain activity (through EEG measurements), hormonal activity, human body dynamics and the like, there has been a constant attempt to equate biological data with musical structures [Miranda et al., 2003]. Another use of computers for music generation has been in ‘computer-assisted composition’. In this case, computers do not generate complete scores. Rather, they provide mediation tools to help composers manage and control some aspects of musical creation. Such aspects may range, according to the composers’ wishes, from high-level decision-making processes to minuscule details. While computer assistance may be a more practical and less ‘generative’ use of computers in musical composition, it is currently enjoying a much wider uptake among composers.

The pioneering era of music generation modelling has also had a strong impact on musicological research. Ever since Hiller’s investigations and works, the idea that computers could model and possibly re-create musical works in a given style has become widely diffused through contemporary musicology. Early ideas were based on generative grammars applied to music. Other systems, largely based on AI techniques, have included knowledge based systems, neural networks and hybrid approaches [Cope, 2005; Papadopoulos & Wiggins, 1999].

Early mathematical models for Music Generation Modelling included stochastic processes (with a special accent on Markov chains). These were followed by chaotic non-linear systems and by systems based on the mathematical theory of communication. All these models have been used for both creative and musicological purposes. In the last 20 years, mathematical modelling of music generation and analysis has developed considerably, going some way to providing the missing link between compositional and musicological research. Several models following different mathematical approaches have been developed. They involve “enumeration combinatorics, group and module theory, algebraic geometry and topology, vector fields and numerical solutions of differential equations, Grothendieck topologies, topos theory, and statistics. The results lead to good simulations of classical results of music and performance theory. There is a number of classification theorems of determined categories of musical structures” [Mazzola, 2001].

A relevant result of mathematical modelling has been to provide a field of potential theories where the specific peculiarities of existing ones can be investigated against non-existing variants. This result creates the possibility of the elaboration of an ‘anthropic principle’ in the historical evolution of music similar to that created in cosmology (that is: understanding whether and why existing music theories are the best possible choices or at least good ones) [Mazzola, 2001].

Music Generation Modelling: Key Issues

Computational models: The main issue of computational models in both the ‘creative’ and the ‘problem solving’ sides of Music Generation Modelling seems to relate to the failure to produce ‘meaningful’ musical results. “... computers do not have feelings, moods or intentions, they do not try to describe something with their music as humans do. Most of human music is referential or descriptive. The reference can be something abstract like an emotion, or something more objective such as a picture or a landscape.'' [Papadopoulos & Wiggins, 1999]. Since ‘meaning’ in music can be expressed – at least in part – as ‘planned deviation from the norm’, future developments in this field will need to find a way to formalise such deviations in order to get closer to the cognitive processes that lie behind musical composition (and possibly also improvisation). In addition, “multiple, flexible, dynamic, even expandable representations [are needed] because this will more closely simulate human behaviour” [Papadopoulos & Wiggins, 1999].Furthermore, while mathematicians and computer scientists evaluate algorithms and techniques in terms of some form of efficiency – be it theoretical or computational – efficiency is only a minor concern, if any, in music composition. The attention of composers and musicians is geared towards the “quality of interaction they have with the algorithm. (...) For example, Markov chains offer global statistical control, while deterministic grammars let composers test different combinations of predefined sequences” [Roads, 1996].

Mathematical models: In a similar vein, the mathematical coherence of current compositional modelling can help understanding the internal coherence of some musical works, but it can hardly constitute, at present, an indication of musical quality at large. Mathematical coherence is only one (possibly minor) aspect of musical form, while music continues to be deeply rooted in auditory perception and psychology. The issue becomes then to merge distant disciplines (mathematics, psychology and auditory perception, to name the most relevant ones) in order to arrive at a better, but still formalized, notion of music creation.

Computer-assisted composition tools: Currently, composers who want to use computers to compose music are confronted, by and large, with two possible solutions. The first is to rely on prepackaged existing software which presents itself as a ‘computer-assisted composition’ tool. The second is to write small or not-so-small applications that will satisfy the specific demands of a given compositional task. Solutions that integrate these approaches have yet to be found. On the one hand, composers will have to become more proficient than at present in integrating their own programming snippets into generalised frameworks. On the other, a long overdue investigation of the ‘transparency’ (or lack thereof) of computer-assisted composition tools [Bernardini, 1985] is in order. Possibly, the current trend that considers good technology as technology that creates the illusion of non-mediation could provide appropriate solutions to this problem. In this case, however, the task will be to discover the multi- modal primitives of action and perception that should be taken into consideration when creating proper mediation technologies in computer-assisted composition.

Notation and multiple interfaces: The composing environment has radically changed in the last 20 years. Today, notation devices and compositional tools inevitably involve the use of computer technology. However, the early research on new notation applications which integrated multimedia content (sound, video, etc.), expressive sound playback, graphic notation for electronic music and advanced tasks such as automatic orchestration and score reduction [Roads 1982], remains to be exploited by composers and musicians at large. Also, little investigation has been conducted into the taxonomy of composing environments today. A related question is whether composing is still a one-(wo)man endeavour, or whether it is moving towards some more elaborate teamwork paradigm (as in films or architecture). Where do mobility, information, participation and networking technologies come into play? These questions require in-depth multidisciplinary research whose full scope is yet to be designed.


David Huron. Sweet Anticipation: Music and the Psychology of Expectation. MIT Press / Bradford Books, Cambridge, MA, 2006.

Marc Leman. Embodied Music Cognition and Mediation Technology. MIT Press, Cambridge, MA, 2008.

Mary Louise Serafine. Music as Cognition: The Development of Thought in Sound. Columbia University Press, New York, 1988.

David Temperley. The Cognition of Basic Musical Structures. MIT Press, Cambridge, MA, 2004.

Nicola Orio. Music Retrieval: A Tutorial and Review. Foundations and Trends in Information Retrieval, 1(1):1-90, 2006.

George Tzanetakis and Perry Cook. Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 10(5):293-302, 2002.

Robert Zatorre. Music, the Food of Neuroscience?. Nature, 434:312-315, 2005.

Martin Supper. A few remarks on algorithmic composition. Computer Music Journal, 25(1):48-53, 2001.

George Papadopoulos and Geraint Wiggins. Ai methods for algorithmic composition: A survey, a critical view and future prospects. In Proceedings of the AISB'99 Symposium on Musical Creativity, 1999.

Guerino Mazzola. Mathematical Music Theory-Status Quo 2000, 2001.

Nicola Bernardini. Semiotics and Computer Music Composition. In Proceedings of the International Computer Music Conference 1985, San Francisco, 1985. CMA.

Eduardo Miranda, Ken Sharman, Kerry Kilborn, and Alexander Duncan. On Harnessing the Electroencephalogram for the Musical Braincap. Computer Music Journal, 27(2):80-102, 2003.


Challenges and Strategies

We identify five broad challenges of relevance to SMC research. The first two are centred on the actual research issues, the third one addresses educational aspects, the fourth one focuses on knowledge transfer and the last one is centred on social concerns.

  1. Design better sound objects and environments: The growing abundance of electronically generated sounds in our environment, coupled with the rapid advances in information and sensor technology, present SMC with unprecedented research challenges, but also opportunities to contribute to improving our audible world.
  2. Understand, model, and improve human interaction with sound and music: The human relation with sound and music is not just a perceptual and cognitive phenomenon: it is also a personal, bodily, emotional, and social experience. The better understanding of this relation from all these perspectives will bring truly useful and rewarding machine-mediated sonic environments and services.
  3. Train multidisciplinary researchers in a multicultural society: SMC is a highly multidisciplinary domain that requires special expertise. But the way the established academic disciplines are being taught makes it difficult to acquire the proper knowledge. Thus there is a need for the establishment of appropriate educational programmes for training these specialists.
  4. Improve knowledge transfer: A large part of SMC research is devoted to applications that can be directly exploited in the arts, in industry and in society at large. Proper knowledge transfer should result into an impact much larger than the current one.
  5. Address social concerns: The role of the SMC field goes beyond that of a mere provider of technological or commercial solutions. SMC has the potential to contribute to maintaining and furthering the richness of human culture and preventing the global technological trends that make the world uniform. Also, SMC should empower users, putting the relevant choices and decisions back into the hands of the individual.


Strategies for addressing the challenges

To address the first two challenges we have to be able to promote research in our field, leading it towards the appropriate direction.

Promote new paradigms for sound synthesis and processing: The traditional concept of musical instrument is not anymore valid for promoting innovation. Any object can be turned into a musical instrument as soon as someone starts exploiting its expressive capabilities and employing some kind of virtuosity. This had happened with many everyday objects in the past, and it is likely to occur in the future, in a world of sensorised and networked objects and spaces. Current research has to explore new paradigms for the design of musical instruments and for the creative use of the technological infrastructures that are being developed.Research is also needed to improve synthesis algorithms, both those based on Signal/Spectral Models and those based on Physical Modelling. At a more structural level, Computer–Assisted Composition should be included and seamlessly integrated with these algorithms. For natural human/sound interactions at an individual level, all advances in Personal Sound devices should be encouraged. They might range from 3D audio over headphones to Computer/Brain Interfaces; from prophylactic uses as in cochlear implants to general–use biofeedback techniques. We also have to go beyond imitation and towards capturing the communicative potentiality of sound. Sound is a powerful information carrier through which to convey rapid and continuous information about objects, events, processes, functions and relations. Research should isolate the physical, acoustic and perceptual features that contribute to the salience of such items, so that sounds can be moulded according to specific communication needs.

Promote research in fields involved in the shaping of natural and artificial acoustic ecosystems: The SMC community should enlarge its scope to give itself the potential to affect fields concerned with designing “sensible spaces” for a better quality of life on various scales: product design, architecture, urban planning, landscape design and conservation. Sound is increasingly perceived as an important component at all levels, not only as a source of pollution, but also as a facilitator of interaction and as a component of the aesthetic experience of a place or its genius loci. We should promote studies aimed at reducing sound and music pollution in public and private ecosystems. Our sense of hearing is a precious resource whose capabilities should be exploited but whose effectiveness can be impaired by oppressive and hostile acoustic environments. The SMC community should encourage studies, technologies and campaigns promoting a sparing and intelligent use of sound in public and private contexts, and support the involvement of psychologists, sociologists and policy makers.

Promote computational modelling approaches in research on auditory perception and cognition: The final goal of most SMC research is to produce tools which can interact meaningfully with the user via sound, possibly integrated with other modalities. To do so, these tools will have to incorporate knowledge about sound perception and multimodal communication. Since auditory perception and cognition is a broad and multidisciplinary field of research, the focus of SMC research should be on aspects that are directly relevant to the goal mentioned above. Computational Auditory Scene Analysis is one such aspect. It should be studied and improved with the aim of identifying and tracking the different sound sources present in a given soundscape or piece of music. Auditory Attention is another aspect. This is a fundamental cognitive capability of human listeners, and it requires more study to assist in the design of effective interactive systems. Aspects of Memory and Learning are related to attention, and are necessary for SMC to bridge the gaps between time–scales (listening to a single note, to a whole piece of music). Also Musical Structural Analysis by human experts should inform the musicological aspects of SMC. In order to focus its research and demonstrate the utility of its methods, SMC should target specific applications that benefit human users. An application on which to focus is the development of devices that enhance aspects of normal auditory perception and cognition. For instance, such devices could focus the attention of the user on some aspect of the auditory scene, to help her getting a clearer understanding of it, and/or to contribute to a deeper understanding of other channels, such as the visual and haptic ones. In the educational field, learning applications should be encouraged. The use of such augmentary devices has a strong social aspect as they could play a crucial role in the design of Auditory Prostheses of the future, which should allow music listening in addition to speech comprehension.

Intensify research in expressivity and communication, developing an embodied approach to perception and action: An essential aspect of sound and music that must be understood, beyond the physical, perceptual and cognitive phenomena themselves, is expressivity in sound and music communication and its relation to emotion. A prime field in which this can be studied is music performance, where expressivity is often just as important as the ‘actual’ music–structure itself. In particular, performance research should transcend its current (narrow) focus on mostly classical music. This favours abstract music–score– centred models that neglect the human in the loop. Instead, it should put more systematic effort into studying the processes of expression transmission in musical environments, with a focus on all three components of the communication channel: the expressive sound (music) itself, the performer and the listener. There is a growing consensus in cognitive science that perception, be it natural or artificial, cannot be fully understood without reference to action. This awareness is especially important in SMC research, where action is intrinsically linked to sound interaction and music making. Research in perception–action topics should thus be encouraged. Ergonomics is the most applied level of research where perception and action meet. At a more fundamental level, sensory–motor theories and embodiment of cognitive abilities are defining and formalising the important aspects of the perception and action loop.

Intensify multimodal and multidisciplinary research on computational methods for bridging the semantic gap in music: The Semantic Gap in SMC — the discrepancy between what can be recognised in music signals by current state–of–the–art methods and what human listeners associate with music — is the main obstacle on the way towards truly intelligent and useful musical companions. Current research efforts aim at the automatic recognition and modelling of higher–level musical patterns (e.g., rhythmic or harmonic structure), but they still essentially adhere to the traditional bottom–up pattern analysis scenario. The bridging of the semantic gap will require a radical re–orientation (1) towards the integration of top–down modelling of (incomplete) musical knowledge and expectations, and (2) towards a widening of the notion of musical understanding. This re-orientation can be achieved by embracing and exploiting other media (including the Web), and modalities (including for example semantic issues related to the allusion of movement and gesture in music). This research will have to be notably multidisciplinary, involving, among others, specialists in musicology, music perception, artificial intelligence, machine learning, and human movement understanding.

Intensify interaction between research and the arts: Artists have an extremely refined understanding — albeit (perhaps) not in ‘scientific’ terms —of issues of perception and perceptibility and, more importantly, of the effect of sound, including its emotional and social ramifications. In order to understand the human experience of sound and music in its full breadth, SMC needs to exploit this resource. Artists may bring up new questions and ways of looking at human and social contexts related to sound. Joint art/research projects, even those which, at first sight, focus on ‘artistic’ and not overtly ‘scientific’ questions, should be promoted and adequately funded. In fact, the strict distinction between the ‘artistic’ and the ‘scientific’ must continually be challenged. The SMC community should also make efforts to strengthen this viewpoint in funding agencies and among decision makers.

In relation to the third challenge the strategies relate to how to create and promote appropriate educational programs for training the future SMC researchers.

Design appropriate multidisciplinary curricula for SMC: Higher Education in SMC must take into account the wide variety of student backgrounds as well as the different final goals of their education. Master and PhD students enter the SMC field from different disciplines, and target their studies to a wide span of objectives, ranging from fundamental and applied research to creative endeavours such as composition and sound design. Therefore, appropriate curricula which allow specialisation must be designed to provide a wide spectrum of knowledge. It is also important to promote broader integration of Arts and Sciences.In the past, composers and content creators were a driving force behind SMC innovation. Their interaction with scientists constituted a positive ecosystem for technological innovation. In return, science provided many methodologies and tools which were greatly inspiring for several art forms. However, the drive for innovation coming from art has progressively diminished due to the increasing specialisation of the domains involved. The arts can again play a creative role when curricula in SMC are better integrated. Composition and Sound Design is a typical example where this integration is possible. Another example concerns innovative multimodal techniques for emotional and expressive analysis in Performance Practice and Musicology. Specific pro–active initiatives must be implemented to allow composers, musicians, musicologists and content creators to complement their training in Europe and abroad. The curricula should also promote cross–cultural integration in education. The recent surge in non–European industries and global markets requires a reconsideration of how education faces up to globalisation, multiculturalism and cross–cultural integration. In particular, the growing population of students, who come to Europe from different non–European cultures and backgrounds, requires appropriate education and pedagogical approaches that reflect a concern for multiculturalism.

Promote coordination between the different educational programs and related initiatives: SMC research has a successful track record thanks to the complementarity and coordination between research centres, however this type of collaborations should be exported to the Higher Education domain. This can be achieved through the integration of Masters’ curricula, PhD programmes, and postgraduate activities. In this context, student/teacher mobility must be encouraged through appropriate funding actions. Stable and enduring support for common activities such as the SMC Summer School, and target–oriented SMC ateliers and workshops must be granted in order to provide continuity in Higher Education. There is also a need for enhancing the education material to be used in these programs. A substantial effort must be made to provide SMC–dedicated high–quality textbooks and tutorials that tackle the multidisciplinarity of the field, and a set of interactive multimedia electronic learning objects which can be exchanged and used in SMC–related curricula. There is also a crucial need for increased access to SMC information in order to attract students and potential industrial partners. An enhancement of the SMC portal is in order. It should provide more up–to–date and expanded information on SMC and related fields. This information must concern curricula, courses, news of events, scholarships, available funding, open positions, and similar information.

To face the lack of proper knowledge transfer, identified in the fourth challenge of the SMC roadmap, the strategies mainly relate to the improvement of dissemination and of quality of our research results.

Promote dissemination and exploitation of SMC research and objectives: The visibility and identity of SMC research should be enhanced, and greater efforts should be made to disseminate it and its objectives in venues outside the standard academic ones. We should promote the presence of SMC at conferences and industrial fairs, in special issues of scientific journals, and on mainstream media channels. We should promote tutorial installations in science museums and at festivals to educate the general public and especially to arouse the interest of children in the SMC field. We should also promote the presence of SMC in cultural activities such as concerts, exhibitions, and installations in public spaces. Representatives of academia and industry should meet more often for open discussions and exchange of information. It is important that companies explain their needs to researchers, so that academia can become aware of new applications and research opportunities. Conversely, it is equally essential that researchers inform companies about advances in academic research, advances which may not be widely known in industry. For this latter to be fruitful, it is important to hold demonstrations directly showing the exploitation potential of a research project. Given that relatively few research results of the SMC community are taken up by industry, researchers and students must be made aware of the possibilities for profit in their research results through direct industrial exploitation, for example by creating start–ups.

Promote academic quality standards and the use of the various models of IP protection of research result: There is a wide variety of journals and conferences in which SMC research is being published. We should promote publication in those journals and conferences which apply a proper peer review process to the evaluation and selection of papers. However, in some research areas within the SMC field, there are no clear academic criteria for the evaluation of publications and research results. Therefore, there is also a general need for promoting quality in all SMC research activities. However researchers should also protect and exploit their work in other ways. Researchers should be aware of the various possibilities for disseminating their work and protecting their IP, knowing the advantages of each choice. Support should be given to the filing of patents, the overcoming of legal obstacles, and to the promotion of alternative means of legal protection, such as Creative Commons ( or Free Software ( licenses.

The need to improve social awareness, as identified in challenge five of the SMC roadmap, is a goal for which the strategies aim at changing our traditional approaches to engineering research.

Expand existing SMC methodologies emphasising user–centred and group experience–centred research: The current methodologies for understanding music are typically based on experimental methods which address the cognitive system of a single listener in a laboratory environment. In practice, however, music is most of the time a social activity in which musical engagement is influenced by the behaviour of other participants. Existing empirical and experimental methodologies should be expanded towards understanding aspects of social music cognition. These involve the study of the social context in which musicians and listeners influence each other during musical activities. Also the tools for collaboration, information and communication exchange are now developed in the context of e–science and e–learning and there are no collaborative tools that incorporate all the music specific information, such as audio files, scores, or extracted audio features. Such tools should take into account the profile and experience of users.

Expand the horizon of SMC by incorporating the research in human sciences and promoting multi–cultural approaches: Apart from cognitive theories of music such as tonality and rhythm categorisation, the human sciences (e.g. musicology, anthropology and sociology) have had little impact on the development of SMC technologies. And yet there is a large amount of knowledge about the social functioning of music that is currently unexploited. Cross–fertilisation between the human and natural sciences, as it is currently being developed in embodied music cognition, may offer new concepts and perspectives for understanding the social functioning of music. Good examples include concepts such as synchronisation, corporeal attuning in response to music, empathy and the sharing of actions. These concepts may provide a useful framework for the development of artistic applications that take into account social interaction as a basic feature of artistic expression. Current SMC research is also dominated by a narrow focus on traditional Western tonal music. SMC should make a conscious effort to transcend this focus, which tends to exclude SMC researchers from other cultures, making it difficult for them to publish results on their ‘native’ music. The goal must be to establish a common awareness in the SMC community of the importance of multicultural research. The musical and cultural expertise of foreign students from non–European countries, who are increasingly coming to study at European universities, should be actively used as a valuable resource in this endeavour.


Sound and Music Computing is a highly multidisciplinary domain that is at the core of ICT-innovation in the cultural and creative industries. SMC has inherited the impressive artistic, scientific and technological history of Electroacoustic and Computer Music and expanded it into innovative realms such as artificial cognition, neurosciences and interactive design. This Roadmap is the result of a coordinated effort to identify and share with the community the medium and long term research issues that might contribute to industrial and social developments. This contribution is likely to have a particular impact in the cultural and creative industries. Given the planetary relevance of these industries, both in economic and state-of-the-art terms, the innovative pathway proposed in the Roadmap is of special relevance.

This SMC Roadmap is targeted not only at scientific policy makers and stakeholders. It will also appeal to a wide public ranging from the research specialist to the curious layman, from the R&D engineer to the contemporary musician. Many of the challenges presented indicate that SMC requires support for basic fundamental research. This level of research must be catered for by academic researchers working in conjunction with visionary creators. Public funding is an absolute necessity at this level. However, the rapidly changing paradigms of the Information Society have completely transformed the opportunities for applied research in SMC. The progressive switch from -product industries-, such as musical instrument and audio device manufacturers, to -service industries-, such as sound and music information providers and content aggregators, offers new, unexplored avenues for SMC knowledge transfer in addition to the classical ones. The target industries for SMC have extended beyond specific music areas: the role and impact of sound and music is of growing importance in, for example, the multimedia industry, home entertainment and therapy and rehabilitation. In particular, the rising importance of the content industry as the fundamental asset for many other large industrial endeavours (such as the network and mobile industries) is certain to create opportunities that SMC must be ready to pick up. Furthermore, the musical instrument industry, the classic reference industry for SMC, is currently expanding into that of manufacturers of any appliance that supports audio interaction (i.e.practically any device that carries an audio transducer of some sort).

This roadmap has been designed to maximise the impact of SMC research in several areas:

Fundamental Knowledge. We still lack essential knowledge at very fundamental levels in topics related to SMC. Examples are perception and cognition, multi-modal interaction and music creation processes. This roadmap has provided a pathway for addressing these fundamental scientific issues.

Quality of Life. The amount of sound and music is growing at a brisk pace, the audio channel being currently exploited in a wide range of applications which carry an unprecedented number of signals and messages. This roadmap has devoted much attention to research fields addressing audio channel pollution and cluttering because these fields will undoubtedly need robust expansion in the near future.

Cultural and Creative Industries. Content industries are, in Europe, larger than the Chemicals, Rubber and Plastic industries put together. Furthermore, they are still rapidly evolving within the context of the Information Society. SMC's impact on these industries is bound to grow along with research discoveries and technological developments.

Information and Communication Technologies. The SMC research outlined in this Roadmap will contribute to new solutions in content-based access to sound and music, thereby adding a new technological layer to current audio recordings and the mobile broadband industry.

Social Health. As it develops, SMC research, by providing technology that fosters access to music, is bound to have an increasingly deep and wide impact on cultural identification and social bonding. Social and cultural issues have been a key concern of the Roadmap.

In this Roadmap we have looked forward and have tried to identify future trends, while remaining very conscious that is impossible to predict reliably the future of SMC research. However it is clear that the future of SMC is bound to be connected to cross-fertilisation among new and previously loosely related disciplines, such as neurosciences, nanotechnologies, sound design and many more. SMC researchers need to be pro-active in this cross-fertilisation, promoting extensive joint research activities with neighbouring fields.

The relevance of many of the contributions of this Roadmap is short lived. Given the exponential advances and changes that are taking place in most of the fields and topics covered, this document should be updated regularly if it is to reflect the current context and state of the art. This Roadmap has been the result of many contributions. Basically it belongs to the whole international SMC research community. As a means of promoting a shared view of our field, we encourage this community to take advantage of it.