This section reviews research aimed at understanding, describing and generating music. This area includes several very difficult problems which are a long way from being solved and will definitely require multidisciplinary approaches. All the disciplines involved in SMC have something to say here. Humanities and engineering approaches are required and scientific and artistic methodologies are also needed.

Music Description and Understanding

Music is central to all human societies. Moreover, there is an increasing belief that interaction with musical environments and the use of music as a very expressive medium for communication helped the evolution of cognitive abilities specific to humans [Zatorre, 2005]. Despite the ubiquity of music in our lives, we still do not fully understand, and cannot completely describe, the musical communication chain that goes from the generation of physical energy (sound) to the formation of meaningful entities in our minds via the physiology of the auditory system.

An understanding of what music is and how it functions is of more than just academic interest. In our society, music is a commercial commodity and a social phenomenon. Understanding how music is perceived, experienced, categorised and enjoyed by people would be of great practical importance in many contexts. Equally useful would be computers that can ‘understand’ (perceive, categorise, rate, etc.) music in ways similar to humans.

In the widest sense, then, the basic goal of SMC in this context is to develop veridical and effective computational models of the whole music understanding chain, from sound and structure perception to the kinds of high-level concepts that humans associate with music – in short, models that relate the physical substrate of music (the sound) to mental concepts invoked by music in people (the ‘sense’). In this pursuit, SMC draws on research results from many diverse fields which are related either to the sound itself (physics, acoustics), to human perception and cognition (psycho-acoustics, empirical psychology, cognitive science), or to the technical/algorithmic foundations of computational modelling (signal processing, pattern recognition, computer science, Artificial Intelligence). Neurophysiology and the brain sciences are also displaying increasing interest in music [Zatorre, 2005], as part of their attempts to identify the brain modules involved in the perception of musical stimuli, and the coordination between them.

With respect to computational models, we currently have a relatively good understanding of the automatic identification of common aspects of musical structure (beat, rhythm, harmony, melody and segment structure) at the symbolic level (i.e., when the input to be analysed is musical scores or atomic notes) [Temperley, 2004]. Research is now increasingly focusing on how musically relevant structures are identified directly from the audio signal. This research on musically relevant audio descriptors is driven mainly by the new application field of Music Information Retrieval (MIR) [Orio, 2006]. Currently available methods fall short as veridical models of music perception (even of isolated structural dimensions), but they are already proving useful in practical applications (e.g., music recommendation systems).

In contrast to these bottom-up and reductionist approaches to music perception modelling, we can also observe renewed interest in more ‘holistic’ views of music perception which stress the importance of considering music as a whole instead of the sum of simple structural features (see, e.g., [Serafine, 1988], who argues that purely structural features, such as rhythm or harmony, may have their roots in music theory rather than in any psychological reality). Current research also tries to understand music perception and action not as abstract capacities, but as ‘embodied’ phenomena that happen in, and can only be explained with reference to, the human body [Leman, 2008].

Generally, many researchers feel that music understanding should address higher levels of musical description related, for example, to kinaesthetic/synaesthetic and emotive/affective aspects. A full understanding of music would also have to include the subjective and cultural contexts of music perception, which means going beyond an individual piece of music and describing it through its relation to other music and even extra-musical contexts (e.g., personal, social, political and economic). Clearly, computational models at that level of comprehensiveness are still far in the future.

Music Description and Understanding: Key Issues

‘Narrow’ SMC vs. multidisciplinarity research: s noted above, many different disciplines are accumulating knowledge about aspects of music perception and understanding, at different levels (physics, signal, structure, ‘meaning’), from different angles (abstract, physiological, cognitive, social), and often with different terminologies
and goals. For computational models to truly capture and reproduce human-level music understanding in all (or many) of its facets, SMC researchers will have to learn to acquaint themselves with this very diverse literature (more so than they currently do) and actively seek alliances with scholars from these other fields – in particular from the humanities, which often seem far distant from the technology-oriented field of SMC.

Reductionist vs. multi-dimensional models: Quantitative-analytical research like SMC tends to be essentially reductionist, cutting up a phenomenon into individual parts and dimensions, and studying these more or less in isolation. In SMC-type music perception modelling, that manifests itself in isolated computational models of,
for example, rhythm parsing, melody identification and harmony extraction, with rather severe limitations. This approach neglects, and fails to take advantage of, the interactions between different musical dimensions (e.g., the relation between sound and timbre, rhythm, melody, harmony, harmonic rhythm and perceived segment structure). It is likely that a ‘quantum leap’ in computational music perception will only be possible if SMC research manages to transcend this approach and move towards multi-dimensional models which at least begin to address the complex interplay of the many facets of music.

Bottom-up vs. top-down modelling: There is still a wide gap between what can currently be recognised and extracted from music audio signals and the kinds of high-level, semantically meaningful concepts that human listeners (with or without musical training or knowledge of theoretical music vocabulary) associate with music. Current attempts at narrowing this ‘semantic gap’ via, for example, machine learning, are producing sobering results. One of the fundamental reasons for this lack of progress seems to be the more or less strict bottom-up approach currently being taken, in which features are extracted from audio signals and ever higher-level features or labels are then computed by analysing and aggregating these features. This may be sufficient for associating broad labels like genre to pieces of music (as, e.g., in [Tzanetakis & Cook, 2002]), but already fails when it comes to correctly interpreting the high-level structure of a piece, and definitely falls short as an adequate model of higher-level cognitive music processing. This inadequacy is increasingly being recognised by SMC researchers, and the coming years are likely to see an increasing trend towards the integration of high-level expectation (e.g., [Huron, 2006]) and (musical) knowledge in music perception models. This, in turn, may constitute a fruitful opportunity for musicologists, psychologists and others to enter the SMC arena and contribute their valuable knowledge.

Understanding the music signal vs. understanding music in its full complexity: Related to the previous issue is the observation that music perception takes place in a rich context. ‘Making sense of’ music is much more than decoding and parsing an incoming stream of sound waves into higher-level objects such as onsets, notes, melodies and harmonies. Music is embedded in a rich web of cultural, historical, commercial and social contexts that influence how it is interpreted and categorised. That is, many qualities or categorisations attributed to a piece by listeners cannot solely be explained by the content of the audio signal itself. It is thus clear that high-quality automatic music description and understanding can only be achieved by also taking into account information sources that are external to the music. Current research in Music Information Retrieval is taking first cautious steps in that direction by trying to use the Internet as a source of ‘social’ information about music (‘community meta-data’). Much more thorough research into studying and modelling these contextual aspects is to be expected. Again, this will lead to intensified and larger scale cooperation between SMC proper and the human and social sciences.

Music Generation Modelling

Due to its symbolic nature – close to the natural computation mechanisms available on digital computers – music generation was among the earliest tasks assigned to a computer, possibly pre- dating any sound generation attempts (which are related to signal processing). The first well-known work generated by a computer, Lejaren Hiller's Illiac Suite for string quartet, was created by the author (with the help of Leonard Isaacson) in 1955-56 and premiered in 1957. At the time, digital sound generation was no more than embryonic (and for that matter, analog sound generation was very much in its infancy, too). Since these pioneering experiences, the computer science research field of Artificial Intelligence has been particularly active in investigating the mechanisms of music creation.

Soon after its early beginnings, Music Generation Modelling split into two major research directions, embracing compositional research on one side and musicological research on the other. While related to each other, these two sub-domains pursue fundamentally different goals. In more recent times, the importance of a third direction, mathematical research on music creation modelling, has grown considerably, perhaps providing the necessary tools and techniques to fill in the gap between the above disciplines.

Music generation modelling has enjoyed a wide variety of results of very different kinds in the compositional domain. These results obviously include art music, but they certainly do not confine themselves to that realm. Research has included algorithmic improvisation, installations and even algorithmic Muzak creation. Algorithmic composition applications can be divided into three broad modelling categories: modelling traditional compositional structures, modelling new compositional procedures, and selecting algorithms from extra-musical disciplines [Supper, 2001]. Some strategies of this last type have been used very proficiently by composers to create specific works. These algorithms are generally related to self-similarity (a characteristic that is closely related to that of ‘thematic development’, which seems to be central to many types of music) and they range from genetic algorithms to fractal systems, from cellular automata to swarm models and coevolution. In this same category, a persistent trend towards using biological data to generate compositional structures has developed since the 1960's. Using brain activity (through EEG measurements), hormonal activity, human body dynamics and the like, there has been a constant attempt to equate biological data with musical structures [Miranda et al., 2003]. Another use of computers for music generation has been in ‘computer-assisted composition’. In this case, computers do not generate complete scores. Rather, they provide mediation tools to help composers manage and control some aspects of musical creation. Such aspects may range, according to the composers’ wishes, from high-level decision-making processes to minuscule details. While computer assistance may be a more practical and less ‘generative’ use of computers in musical composition, it is currently enjoying a much wider uptake among composers.

The pioneering era of music generation modelling has also had a strong impact on musicological research. Ever since Hiller’s investigations and works, the idea that computers could model and possibly re-create musical works in a given style has become widely diffused through contemporary musicology. Early ideas were based on generative grammars applied to music. Other systems, largely based on AI techniques, have included knowledge based systems, neural networks and hybrid approaches [Cope, 2005; Papadopoulos & Wiggins, 1999].

Early mathematical models for Music Generation Modelling included stochastic processes (with a special accent on Markov chains). These were followed by chaotic non-linear systems and by systems based on the mathematical theory of communication. All these models have been used for both creative and musicological purposes. In the last 20 years, mathematical modelling of music generation and analysis has developed considerably, going some way to providing the missing link between compositional and musicological research. Several models following different mathematical approaches have been developed. They involve “enumeration combinatorics, group and module theory, algebraic geometry and topology, vector fields and numerical solutions of differential equations, Grothendieck topologies, topos theory, and statistics. The results lead to good simulations of classical results of music and performance theory. There is a number of classification theorems of determined categories of musical structures” [Mazzola, 2001].

A relevant result of mathematical modelling has been to provide a field of potential theories where the specific peculiarities of existing ones can be investigated against non-existing variants. This result creates the possibility of the elaboration of an ‘anthropic principle’ in the historical evolution of music similar to that created in cosmology (that is: understanding whether and why existing music theories are the best possible choices or at least good ones) [Mazzola, 2001].

Music Generation Modelling: Key Issues

Computational models: The main issue of computational models in both the ‘creative’ and the ‘problem solving’ sides of Music Generation Modelling seems to relate to the failure to produce ‘meaningful’ musical results. “... computers do not have feelings, moods or intentions, they do not try to describe something with their music as humans do. Most of human music is referential or descriptive. The reference can be something abstract like an emotion, or something more objective such as a picture or a landscape.'' [Papadopoulos & Wiggins, 1999]. Since ‘meaning’ in music can be expressed – at least in part – as ‘planned deviation from the norm’, future developments in this field will need to find a way to formalise such deviations in order to get closer to the cognitive processes that lie behind musical composition (and possibly also improvisation). In addition, “multiple, flexible, dynamic, even expandable representations [are needed] because this will more closely simulate human behaviour” [Papadopoulos & Wiggins, 1999].Furthermore, while mathematicians and computer scientists evaluate algorithms and techniques in terms of some form of efficiency – be it theoretical or computational – efficiency is only a minor concern, if any, in music composition. The attention of composers and musicians is geared towards the “quality of interaction they have with the algorithm. (...) For example, Markov chains offer global statistical control, while deterministic grammars let composers test different combinations of predefined sequences” [Roads, 1996].

Mathematical models: In a similar vein, the mathematical coherence of current compositional modelling can help understanding the internal coherence of some musical works, but it can hardly constitute, at present, an indication of musical quality at large. Mathematical coherence is only one (possibly minor) aspect of musical form, while music continues to be deeply rooted in auditory perception and psychology. The issue becomes then to merge distant disciplines (mathematics, psychology and auditory perception, to name the most relevant ones) in order to arrive at a better, but still formalized, notion of music creation.

Computer-assisted composition tools: Currently, composers who want to use computers to compose music are confronted, by and large, with two possible solutions. The first is to rely on prepackaged existing software which presents itself as a ‘computer-assisted composition’ tool. The second is to write small or not-so-small applications that will satisfy the specific demands of a given compositional task. Solutions that integrate these approaches have yet to be found. On the one hand, composers will have to become more proficient than at present in integrating their own programming snippets into generalised frameworks. On the other, a long overdue investigation of the ‘transparency’ (or lack thereof) of computer-assisted composition tools [Bernardini, 1985] is in order. Possibly, the current trend that considers good technology as technology that creates the illusion of non-mediation could provide appropriate solutions to this problem. In this case, however, the task will be to discover the multi- modal primitives of action and perception that should be taken into consideration when creating proper mediation technologies in computer-assisted composition.

Notation and multiple interfaces: The composing environment has radically changed in the last 20 years. Today, notation devices and compositional tools inevitably involve the use of computer technology. However, the early research on new notation applications which integrated multimedia content (sound, video, etc.), expressive sound playback, graphic notation for electronic music and advanced tasks such as automatic orchestration and score reduction [Roads 1982], remains to be exploited by composers and musicians at large. Also, little investigation has been conducted into the taxonomy of composing environments today. A related question is whether composing is still a one-(wo)man endeavour, or whether it is moving towards some more elaborate teamwork paradigm (as in films or architecture). Where do mobility, information, participation and networking technologies come into play? These questions require in-depth multidisciplinary research whose full scope is yet to be designed.


David Huron. Sweet Anticipation: Music and the Psychology of Expectation. MIT Press / Bradford Books, Cambridge, MA, 2006.

Marc Leman. Embodied Music Cognition and Mediation Technology. MIT Press, Cambridge, MA, 2008.

Mary Louise Serafine. Music as Cognition: The Development of Thought in Sound. Columbia University Press, New York, 1988.

David Temperley. The Cognition of Basic Musical Structures. MIT Press, Cambridge, MA, 2004.

Nicola Orio. Music Retrieval: A Tutorial and Review. Foundations and Trends in Information Retrieval, 1(1):1-90, 2006.

George Tzanetakis and Perry Cook. Musical Genre Classification of Audio Signals. IEEE Transactions on Speech and Audio Processing, 10(5):293-302, 2002.

Robert Zatorre. Music, the Food of Neuroscience?. Nature, 434:312-315, 2005.

Martin Supper. A few remarks on algorithmic composition. Computer Music Journal, 25(1):48-53, 2001.

George Papadopoulos and Geraint Wiggins. Ai methods for algorithmic composition: A survey, a critical view and future prospects. In Proceedings of the AISB'99 Symposium on Musical Creativity, 1999.

Guerino Mazzola. Mathematical Music Theory-Status Quo 2000, 2001.

Nicola Bernardini. Semiotics and Computer Music Composition. In Proceedings of the International Computer Music Conference 1985, San Francisco, 1985. CMA.

Eduardo Miranda, Ken Sharman, Kerry Kilborn, and Alexander Duncan. On Harnessing the Electroencephalogram for the Musical Braincap. Computer Music Journal, 27(2):80-102, 2003.