Acoustic cues to beat induction: A machine learning perspective

Fabien Gouyon1 , Gerhard Widmer2 & Xavier Serra1

1Universitat Pompeu Fabra, Barcelona, Spain 2Johannes Kepler University, Linz, Austria

According to Honing (1993), “there seems to be a general consensus on the notion of discrete elements (e.g. notes, sound events or objects) as the primitives of music ... but a detailed discussion and argument for this assumption is missing from the literature.” While early computational models of beat induction often include the processing of discrete events as parsed scores or MIDI events, many recent systems deal directly with acoustic signals. Part of them intend to derive similar note-like representations and others refer to a data granularity of a lower level of abstraction and a different timescale: acoustic features computed on consecutive short signal frames (typically 10~ms-long). While on the one hand different musical cues to beat induction have been studied in the context of discrete note representations (note time, duration, pitch, harmony), on the other hand, few different lower level features have been considered, mainly energy variations in several frequency bands. In this study, we address the question of which acoustic features are the most adequate for identifying musical beats computationally. We consider 274 different acoustic features computed on consecutive frames and evaluate systematically the worth of individual features as well as feature subsets in the task of providing reliable cues for the presence and localization of beats in musical signals. Evaluation of features is based on a machine learning methodology implying a large corpus of beat-annotated musical audio pieces covering 10 musical genres (1360 instances, more than 90000 beats).