In this section we review a variety of research issues that address interaction with sound and music. Three main topics are considered: Music Interfaces, Performance Modelling and Control, and Sound Interaction Design. Music Interfaces is quite a well-established topic which deals with the design of controllers for music performance. Performance Modelling and Control is an area that has been quite active in the last decade. It has focused on the study of the performance of classical music but more recently is opening up to new challenges. The last topic covered under the Interaction heading is Sound Interaction Design. This is a brand new area that opens up many new research problems not previously addressed within the SMC research community.

Music Interfaces

Digital technologies have revolutionised the development of new musical instruments, not only because of the sound generation possibilities of the digital systems, but also because the concept of ‘musical instrument’ has changed with the use of these technologies. In most acoustic instruments, the separation between the control interface and the sound-generating subsystems is fuzzy and unclear. In the new digital instruments, the gesture controller (or input device) that takes the control information from the performer(s) is always separate from the sound generator. For exact and repeatable control of a synthesizer, or a piece of music, a computer-based notation program gives a stable environment (see, e.g., [Kuuskankare & Laurson, 2006]). For real-time control the controlling component can be a simple computer mouse, a computer keyboard or a MIDI keyboard, but with the use of sensors and appropriate analogue-to-digital converters, any signal coming from the outside can be converted into control messages intelligible to the digital system. A recent example is music interfaces enabling control through expressive full-body movement and gesture [Camurri et al., 2005]. The broad accessibility of devices, such as video cameras and analog-to- MIDI interfaces, provides a straightforward means for the computer to access sensor data. The elimination of the physical dependencies has meant that all previous construction constraints in the design of digital instruments have been relaxed [Jordà, 2005].

A computer-augmented instrument takes an existing instrument as its base and uses sensors and other instrumentation to pick up as much information as possible from the performer’s motions. The computer uses both the original sound of the instrument and the feedback from the sensor array to create and/or modify new sounds. Augmented instruments are often called hyper-instruments after the work done at MIT's Media Lab [Paradiso, 1997], which aimed at providing virtuoso performers with controllable means of amplifying their gestures, suggesting coherent extensions to instrumental playing techniques.

One of the new paradigms of digital instruments is the idea of collaborative performance and of instruments that can be performed by multiple players. In this type of instrument, performers can take an active role in determining and influencing not only their own musical output but also that of their collaborators. These music collaborations can be achieved over networks such as the Internet, and the study of network or distributed musical systems is a new topic on which much research is being carried out [Barbosa, 2006].

Most current electronic music is being created and performed with laptops, turntables and controllers that were not really designed to be used as music interfaces. The mouse has become the most common music interface, and several of the more radical and innovative approaches to real- time performance are currently found in the apparently more conservative area of screen-based and mouse-controlled software interfaces. Graphical interfaces may be historically freer and better suited to unveiling concurrent, complex and unrelated musical processes. Moreover, interest in gestural interaction with sound and music content and in gestural control of digital music instruments is emerging as part of a more general trend towards research on gesture analysis, processing and synthesis. This growing importance is demonstrated by the fact that the Gesture Workshop series of conferences recently included sessions on gesture in music and the performing arts. Research on gesture not only enables a deeper investigation of the mechanisms of human- human communication, but may also open up unexplored frontiers in the design of a novel generation of multimodal interactive (music) systems.

A recent trend around new music interfaces and digital instruments is that they are more and more designed for interaction with non-professional users. The concepts of active experience and active listening are emerging, referring to the opportunity for beginners, naïve and inexperienced users, in a collaborative framework, to interactively operate on music content, by modifying and moulding it in real-time while listening. The integration of research on active listening, context-awareness, gestural control is leading to new creative forms of interactive music experience in context-aware (mobile) scenarios, resulting in an embodiment and control of music content by user behaviour, e.g., gestures and actions (for a recent example see [Camurri et al., 2007]).

Music Interfaces: Key Issues

Design of innovative multimodal music interfaces: A key target for designers of future interactive music systems is to endow them with natural, intelligent and adaptive multimodal interfaces which exploit the ease and naturalness of ordinary physical gestures in everyday contexts and actions. Examples are tangible interfaces (e.g., [Ishii & Ulmer, 1997]) and their technological realization as Tangible Acoustic Interfaces (TAIs), which exploit the propagation of sound in physical objects in order to locate touching positions. TAIs are a very promising interface for future interactive music systems. They have recently been enhanced with algorithms for multimodal high-level analysis of touching gestures so that information can be obtained about how the interface is touched (e.g., forcefully or gently). Despite such progress, currently available multimodal interfaces still need improvements. A key issue is to develop interfaces that can grab subtler high-level information. For example, research has been devoted to multimodal analysis of basic emotions (e.g., happiness, fear, sadness, anger), but we are still far from modelling more complex phenomena such as engagement, empathy, entrainment. Moreover, current multimodal interfaces usually are not context-aware, i.e., they analyse users’ gestures and their expressiveness, but they do not take into account the context in which the gestures are performed. Another key issue is related to scalability. Current multimodal interfaces often require special purpose set-ups including positioning of video cameras and careful preparation of objects e.g., for TAIs. Such systems are often not scalable and difficult to port in the home and in the personal environment. A major research challenge is to exploit future mobile devices, the sensors they will be endowed with, and their significantly increased computational power and wireless communication abilities.

Integration of control with sound generation: The separation between gesture controllers and output generators has some significant negative consequences, the most obvious being the reduction of the ‘feel’ associated with producing a certain kind of sound. Another frequent criticism is the inherent limitations of MIDI, the protocol that connects these two components of the instrument chain. A serious attempt to overcome these limitations is provided by the UDP-based Open Sound Control (OSC) protocol [Wright, 2005]. However, there is a more basic drawback concerning the conceptual and practical separation of new digital instruments into two separated components: it becomes hard – or even impossible – to design highly sophisticated control interfaces without a profound prior knowledge of how the sound or music generators will work. Generic, non-specific music controllers tend to be either too simple, mimetic (imitating traditional instruments), or too technologically biased. They can be inventive and adventurous, but their coherence cannot be guaranteed if they cannot anticipate what they are going to control [Jordà, 2005].

Feedback systems: When musicians play instruments, they perform certain actions with the expectation of achieving a certain result. As they play, they monitor the behaviour of their instrument and, if the sound is not quite what they expect, they will adjust their actions to change it. In other words, they have effectively become part of a control loop, constantly monitoring the output from their instrument and subtly adjusting bow pressure, breath pressure or whatever control parameter is appropriate. The challenge is to provide the performer of a digital instrument with the appropriate feedback to control the input parameters better than that provided by mere auditory feedback. One proposed solution is to make use of the musician’s existing sensitivity to the relationship between an instrument’s ‘feel’ and its sound with both haptic and auditory feedback [O’Modhrain, 2000]. Other solutions may rely on visual and auditory feedback [Jordà, 2005].

Designing effective interaction metaphors: Beyond the two previous issues, which concern the musical instrument paradigm, the design of structured and dynamic interaction metaphors, enabling users to exploit sophisticated gestural interfaces, has the potential to lead to a variety of music and multimedia applications beyond the musical instrument metaphor. The state-of-the-art practice mainly consists of direct and strictly causal gesture/sound associations, without any dynamics or evolutionary behaviour. However, research is now shifting toward higher-level indirect strategies [Visell and Cooperstock, 2007]: these include reasoning and decision-making modules related to rational and cognitive processes, but they also take into account perceptual and emotional aspects. Music theory and artistic research in general can feed SMC research with further crucial issues. An interesting aspect, for instance, is the question of expressive autonomy [Camurri et al., 2000], that is, the degree of freedom an artist leaves to a performance involving an interactive music system.

Improving the acceptance of new interfaces: The possibilities offered by digital instruments and controllers are indeed endless. Almost anything can be done and much experimentation is going on. Yet the fact is that there are not that many professional musicians who use them as their main instrument. No recent electronic instrument has reached the (limited) popularity of the Theremin or the Ondes Martenot, invented in 1920 and 1928, respectively.1 Successful new instruments exist, but they are not digital, not even electronic. The most recent successful instrument is the turntable, which became a real instrument in the early eighties when it started to be played in a radically unorthodox and unexpected manner. It has since then developed its own musical culture, techniques and virtuosi. For the success of new digital instruments, the continued study of sound control, mapping, ergonomics, interface design and related matters is vital. But beyond that, what is required is integral studies that consider not only ergonomic but also psychological, social and, above all, musical issues.

Performance Modelling and Control

A central activity in music is performance, that is, the act of interpreting, structuring, and physically realising a work of music by playing a musical instrument. In many kinds of music – particularly so in Western art music – the performing musician acts as a kind of mediator: a mediator between musical idea and instrumental realisation, between written score and musical sound, between composer and listener/audience. Music performance is a complex activity involving physical, acoustic, physiological, psychological, social and artistic issues. At the same time, it is also a deeply human activity, relating to emotional as well as cognitive and artistic categories.

Understanding the emotional, cognitive and also (bio-)mechanical mechanisms and constraints governing this complex human activity is a prerequisite for the design of meaningful and useful music interfaces (see above) or more general interfaces for interaction with expressive media such as sound (see next section). Research in this field ranges from studies aimed at understanding expressive performance to attempts at modelling aspects of performance in a formal, quantitative and predictive way.

Quantitative, empirical research on expressive music performance dates all the way back to the 1930s, to the pioneering work by Seashore and colleagues in the U.S. After a period of neglect, the topic experienced a veritable renaissance in the 1970s, and music performance research is now thriving and highly productive (a comprehensive overview can be found in [Gabrielsson, 2003]). Historically, research in (expressive) music performance has focused on finding general principles underlying the types of expressive ‘deviations’ from the musical score (e.g., in terms of timing, dynamics and phrasing) that are a hallmark of expressive interpretation. Three different research strategies can be discerned (see [De Poli, 2004; Widmer & Goebl, 2004] for recent overviews on expressive performance modelling): (1) acoustic and statistical analysis of performances by real musicians – the so-called analysis-by-measurement method; (2) making use of interviews with expert musicians to help translate their expertise into performance rules – the so-called analysis-by- synthesis method; and (3) inductive machine learning techniques applied to large databases of performances.

Studies along these lines by a number of research teams around the world have shown that there are significant regularities that can be uncovered in these ways, and computational models of expressive performance (of mostly classical music) have proved to be capable of producing truly musical results These achievements are currently inspiring a great deal of research into more comprehensive computational models of music performance and also ambitious application scenarios.

One such new trend is quantitative studies into the individual style of famous musicians. Such studies are difficult because the same professional musician can perform the same score in very different ways (cf. commercial recordings by Vladimir Horowitz and Glenn Gould). Recently, new methods have been developed for the recognition of music performers and their style, among them the fitting of performance parameters in rule-based performance models and the application of machine learning methods for the identification of the performance style of musicians. Recent results of specialised experiments show surprising artist recognition rates (e.g., [Saunders et al., 2004]).

So far, music performance research has been mainly concerned with describing detailed performance variations in relation to musical structure. However, there has recently been a shift towards high-level musical descriptors for characterising and controlling music performance, especially with respect to emotional characteristics. For example, it has been shown that it is possible to generate different emotional expressions of the same score by manipulating rule parameters in systems for automatic music performance [Bresin & Friberg, 2000].

Interactive control of musical expressivity is traditionally the task of the conductor. Several attempts have been made to control the tempo and dynamics of a computer-played score with some kind of gesture input device. For example, Friberg [2006] describes a method for interactively controlling, in real time, a system of performance rules that contain models for phrasing, micro- level timing, articulation and intonation. With such systems, high-level expressive control can be achieved. Dynamically controlled music in computer games is another important future application.

Visualisation of musical expressivity, though perhaps an unusual idea, also has a number of useful applications. In recent years, a number of efforts have been made in the direction of new display forms of expressive aspects of music performance. Langner and Goebl [2003] have developed a method for visualising an expressive performance in a tempo-loudness space: expressive deviations leave a trace on the computer screen in the same way as a worm does when it wriggles over sand, producing a sort of ‘fingerprint’ of the performance. This and other recent methods of visualisation can be used for the development of new multi-modal interfaces for expressive communication, in which expressivity embedded in audio is converted into visual representation, facilitating new applications in music research, music education and HCI, as well as in artistic contexts. A visual display of expressive audio may also be desirable in environments where audio display is difficult or must be avoided, or in applications for hearing-impaired people.

For many years, research in Human-Computer Interaction in general and in sound and music computing in particular was devoted to the investigation of mainly ‘rational’, abstract aspects. In the last ten years, however, a great number of studies have emerged which focus on emotional processes and social interaction in situated or ecological environments. Examples are the research on Affective Computing at MIT [Picard, 1997] and research on KANSEI Information Processing in Japan [Hashimoto, 1997]. The broad concept of ‘expressive gesture’, including music, human movement and visual (e.g., computer animated) gesture, is the object of much contemporary research.

Performance Modelling and Control: Key Issues

A deeper understanding of music performance: Despite some successes in computational performance modelling, current models are extremely limited and simplistic vis-a-vis the complex phenomenon of musical expression. It remains an intellectual and scientific challenge to probe the limits of formal modelling and rational characterisation. Clearly, it is strictly impossible to arrive at complete predictive models of such complex human phenomena. Nevertheless, work towards this goal can advance our understanding and appreciation of the complexity of artistic behaviours. Understanding music performance will require a combination of approaches and disciplines – musicology, AI and machine learning, psychology and cognitive science.

For cognitive neuroscience, discovering the mechanisms that govern the understanding of music performance is a first-class problem. Different brain areas are involved in the recognition of different performance features. Knowledge of these can be an important aid to formal modelling and rational characterisation of higher order processing, such as the perceptual differentiation between human-like and mechanical performances. Since music making and appreciation is found in all cultures, the results could be extended to the formalisation of more general cognitive principles.

Computational models for artistic music performance: The use of computational music performance models in artistic contexts (e.g., interactive performances) raises a number of issues that have so far only partially been faced. The concept of a creative activity being predictable and the notion of a direct ‘quasi-causal’ relation between the musical score and a performance are both problematic. The unpredictable intentionality of the artist and the expectations and reactions of listeners are neglected in current music performance models. Surprise and unpredictability are crucial aspects in an active experience such as a live performance. Models considering such aspects should take account of variables such as performance context, artistic intentions, personal experiences and listeners’ expectations.

Music interaction models in multimedia applications: There will be an increasing number of products which embed possibilities for interaction and expression in the rendering, manipulation and creation of music. In current multimedia products, graphical and musical objects are mainly used to enrich textual and visual information. Most commonly, developers focus more on the visual rather than the musical component, the latter being used merely as a realistic complement or comment to text and graphics. Improvements in the human-machine interaction field have largely been matched by improvements in the visual component, while the paradigm of the use of music has not changed adequately. The integration of music interaction models in the multimedia context requires further investigation, so that we can understand how users can interact with music in relation to other media. Two particular research issues that need to be addressed are models for the analysis and recognition of users’ expressive gestures, and the communication of expressive content through one or more non-verbal communication channels mixed together.

Sound Interaction Design

Sound-based interactive systems can be considered from several points of view and several perspectives: content creators, producers, providers and consumers of various kinds, all in a variety of contexts. Sound is becoming more and more important in interaction design, in multimodal interactive systems, in novel multimedia technologies which allow broad, scalable and customised delivery and consumption of active content. In these scenarios, some relevant trends are emerging that are likely to have a deep impact on sound related scientific and technological research in the coming years. Thanks to research in Auditory Display, Interactive Sonification and Soundscape Design, sound is becoming an increasingly important part of Interaction Design and Human- Computer Interaction.

Auditory Display is a field that has already reached some kind of consolidated state. A strong community in this field has been operating for more than twenty years (see Auditory Display and Sonification are about giving audible representation to information, events and processes. Sound design for conveying information is, thus, a crucial issue in the field of Auditory Display. The main task of the sound designer is to find an effective mapping between the data and the auditory objects that are supposed to represent them in a way that is perceptually and cognitively meaningful. Auditory warnings are perhaps the only kind of auditory displays that have been thoroughly studied and for which solid guidelines and best design practices have been formulated. A milestone publication summarising the multifaceted contributions to this sub-discipline is the book edited by Stanton and Edworthy [1999].

If Sonification is the use of non-speech audio to perceptualize information, Interactive Sonification is a more recent specialization that takes advantage of the increasing diffusion of sensing and actuating technologies. The listener is actively involved in a perception/action loop, and the main objective is to generate a sonic feedback which is coherent with physical interactions performed with sonically-augmented artifacts. This allows active exploration of information spaces and more engaging experiences. A promising approach is Model Based Sonification [Hermann & Ritter, 2005] which uses sound modelling techniques in such a way that sound emerges as an organic product of interactions among modelling blocks and external agents. Often, interaction and sound feedback are enabled by physically-based models. For example, the user controls the inclination of a stick, and a virtual ball rolls over it producing a sound that reveals the surface roughness and situations of equilibrium [Rath & Rocchesso, 2005]. While building these interactive objects for sonification, it is soon realized that fidelity to the physical phenomena is not necessarily desirable.

Sound models are often more effective if they are "minimal yet veridical" [Rocchesso et al., 2003], or if they exaggerate some traits as it is done by cartoonists.

A third emerging area of research with strong implications for social life, whose importance is astonishingly underestimated, is that of sound in the environment – on different scales, from architectonic spaces to urban contexts and even to truly geographical dimensions. Soundscape Design as the auditory counterpart of landscape design is the discipline that studies sound in its environmental context, from both naturalistic and cultural viewpoints. It is going to become more and more important in the context of the acoustically saturated scenarios of our everyday life. Concepts such as ‘clear hearing’ and hi-fi versus lo-fi soundscapes, introduced by Murray Schafer [1994], are becoming crucial as ways of tackling the ‘composition’ of our acoustic environment in terms of appropriate sound design.

Sound Interaction Design: Key Issues

Evaluation methodologies for sound design: Before Sound Interaction Design, there is Sound Design. And it is worth asking whether this latter is a mature discipline in the sense that design itself is. Is there anybody designing sounds with the same attitude that Philippe Starck designs a lemon squeezer? What kind of instruments do we have at our disposal for the objective evaluation of the quality and the effectiveness of sound products in the context, for example, of industrial design? As a particular case, sound product design is rapidly acquiring a more and more relevant place in the loop of product implementation and evaluation. Various definitions of sound quality have been proposed and different evaluation parameters have been put forward for deriving quantitative predictions from sound signals [Lyon, 2000]. The most commonly used parameters (among others) are loudness, sharpness, roughness and fluctuation strength. Loudness is often found to be the dominant measurable factor that adversely affects sound quality. However, more effective and refined measurement tools for defining and evaluating the aesthetic contents and the functionality of a sound have not yet been devised. The development of appropriate methodologies of this kind is an urgent task for the growth of Sound Design as a mature discipline.

Everyday listening and interactive systems: In the field of Human Computer interaction, auditory icons have been defined as ‘natural’ audio messages that convey information and feedback about events in an intuitive way. The concepts of auditory icons and ‘Everyday Listening’, as opposed to ‘Musical Listening’, were introduced by William Gaver [1994]. The notion of auditory icons is situated within a more general philosophy of an ecological approach to perception. The concept of auditory icons is to use natural and everyday sounds to represent actions and sounds within an interface. In this context, a relevant consideration emerges: a lot of research effort has been devoted to the study of musical perception, while our auditory system is first of all a tool for interacting with the outer world in everyday life. When we consciously listen to or more or less unconsciously hear ‘something’ in our daily experience, we do not really perceive and recognise sounds but rather events and sound sources. Both from a perceptual point of view (sound to sense) and from a modelling/generation point of view (sense to sound), a great effort is still required to achieve the ability to use sound in artificial environments in the same way that we use sound feedback to interact with our everyday environment.

Sonification as art, science, and practice: Sonification, in its very generic sense of information representation by means of sound, is still an open research field. Although a lot of work has been done, clear strategies and examples of how to design sound in order to convey information in an optimal way have only partially emerged. Sonification remains an open issue which involves communication theory, sound design, cognitive psychology, psychoacoustics and possibly other disciplines. A specific question that naturally emerges is whether the expertise of composers, who are accustomed to organising sound in time and polyphonic density, could be helpful in developing more ‘pleasant’ (and thus effective) auditory display design. Would it be possible to define the practice of sonification in terms that are informed by the practice of musical composition? Or, more generally, is an art-technology collaboration a positive, and perhaps vital, element in the successful design of auditory displays? Another inescapable issue is the active use of auditory displays. Sonification is especially effective with all those kinds of information that have a strong temporal basis, and it is also natural to expect that the active involvement of the receiver may lead to better understanding, discoveries and aesthetic involvement. In interactive sonification, the user may play the role of the performer in music production. In this sense, the interpreter of a precisely prescribed music score, adding expressive nuances, or the jazz improviser jiggling here and there within a harmonic sieve could be two good metaphors for an interactive sonification process.

Sound and multimodality: Recently, Auditory Display and Sonification research has also entered the field of multimodal and multi-sensory interaction, exploiting the fact that synchronisation with other sensory channels (e.g., visual, tactile) provides improved feedback. An effective research approach to the kinds of problems that this enterprise brings up is the study of sensorial substitutions. For example, a number of sensory illusions can be used to ‘fool’ the user via cross-modal interaction. This is possible because everyday experience is intrinsically multimodal and properties such as stiffness, weight, texture, curvature and material are usually determined via cues coming from more than one channel. Soundscape Design: A soundscape is not an accidental by-product of a society. On the contrary, it is a construction, a more or less conscious ‘composition’ of the acoustic environment in which we live. Hearing is an intimate sense similar to touch: the acoustic waves are a mechanical phenomenon and they ‘touch’ our hearing apparatus. Unlike eyes, the ears do not have lids. It is thus a delicate and extremely important task to take care of the sounds that form the soundscape of our daily life. However, the importance of the soundscape remains generally unrecognised and a process of education which would lead to more widespread awareness is urgently needed.


A. Gabrielsson. Music Performance Research at the Millennium. Psychology of Music, 31(3):221-272, 2003.

Gerhard Widmer and Werner Goebl. Computational Models of Expressive Music Performance:The State of the Art. Journal of New Music Research, 33(3):203-216, 2004.

C. Saunders, D. Hardoon, J. Shawe-Taylor, and G. Widmer. Using String Kernels to Identify Famous Performers from their Playing Style. In Proceedings of the 15th European Conference on Machine Learning (ECML'2004), Pisa, Italy, 2004.

William W. Gaver. Auditory Display: Sonification, Audification and Auditory Interfaces, chapter Using and Creating Auditory Icons, pages 417-446. Addison Wesley, 1994.

Neville A. Stanton and Judy Edworthy. Human Factors in Auditory Warnings. Ashgate, Aldershot, UK, 1999.

M. Schafer. Soundscape - Our Sonic Environment and the Tuning of the World. Destiny Books, Rochester, Vermont., 1994.

M. Rath and D. Rocchesso. Continuous sonic feedback from a rolling ball. IEEE Multimedia, 12(2):60-69, 2005.

D. Rocchesso, R. Bresin, and Fernströmi M. Sounding Objects. IEEE Multimedia, pages 42-52, 2003.

T. Hermann and H. Ritter. Model-Based Sonification Revisited -Authors- Comments on Hermann and Ritter, ICAD 2002. ACM Transactions on Applied Perception, 4(2):559-563, October 2005.

R. Bresin and A. Friberg. Emotional Coloring of Computer-Controlled Music Performances. Computer Music Journal, 24(4):44-63, 2000.

G. De Poli. Methodologies for expressiveness modeling of and for music performance. Journal of New Music Research, 33(3):189-202, 2004.

A. Friberg. pDM: an expressive sequencer with real-time control of the KTH music performance rules. Computer Music Journal, 30(1):37-48, 2006.

J. Langner and W. Goebl. Visualizing expressive performance in tempo-loudness space. Computer Music Journal, 27(4):69-83, 2003.

H. Ishii and B. Ullmer. Tangible Bits: Towards Seamless Interfaces between People, Bits and Atoms. In Proceedings of CHI '97, pages 22-27, March 1997.

Sergi Jordà. Digital Lutherie: Crafting musical computers for new musics performance and improvisation. PhD thesis, Pompeu Fabra University, Barcelona, 2005.

J. A. Paradiso. Electronic Music: New ways to play. IEEE Spectrum, 34(12):18-30, 1997.

Alvaro Barbosa. Computer-Supported Cooperative Work for Music Applications. PhD thesis, Pompeu Fabra University, Barcelona, 2006.

M. S. O'Modhrain. Playing by Feel: Incorporating Haptic Feedback into Computer-Based musical Instruments. PhD thesis, Stanford University, 2000.

A. Camurri, P. Coletta, M. Ricchetti, and G Volpe. Expressiveness and physicality in interaction. Journal of New Music Research, 29(3), September 2000.