A Study on the Use of Perceptual Features for Music Emotion Recognition

Publication Type:

Conference Paper


Proceedings of the Sound and Music Computing Conference 2016, SMC 2016, Hamburg, Germany (2016)






Perceptual features are defined as musical descriptors that closely match a listener’s understanding of musical characteristics. This paper tackles Music Emotion Recognition through the consideration of three kinds of perceptual feature sets, human rated, computational and modelled features. The human rated features are extracted through a survey and the computational features are estimated directly from the audio signal. Regressive modelling is used to predict the human rated features from the computational features. The latter predicted set constitute the modelled features. The regressive models performed well for all features except for Harmony, Timbre and Melody. All three feature sets are used to train three regression models (one for each set) to predict the components Energy and Valence, which are then used to recognise emotion. The model trained on the rated features performed well for both components. This therefore shows that emotion can be predicted from perceptual features. The models trained on the computational and modelled features performed well in predicting Energy, but not so well in predicting Valence. This is not surprising since the main predictors for Valence are Melody, Harmony and Timbre, which therefore need added or modified computational features that better match human perception.

SMC2016_submission_1.pdf842.25 KB
SMC paper: