Instructions for estimating the location of beats in a soundfile

Background

There has long been a rather optimistic notion that one might be able to identify a specific moment as the time at which a beat is perceived. Intuitively, one might think of this as the moment at which one would tap or clap in "time" with the sound. This became formalized as the notion of the P-centre (or P-center) [1], and there have been several attempts to provide algorithms for determining the best estimate of the P-center, or beat [2], [3]. None of these is perfect, and there is no perfect answer to the problem. In what follows, I provide a means of implementing the algorithm first used in [4], which in turn is based on the experimental work in [3].

The Algorithm

The algorithm is simple. We start with a sound file. Here, for example, is a short Chinese sentence.

Here is the same soundfile, after it has been bandpass filtered with cutoff points at 500 Hz and 2,500 Hz. This effectively cuts out the bulk of the energy due to the fundamental frequency (F0) and energy associated with fricatives, leaving the formants. In the lower half, you can see the amplitude envelope.

We associate a beat with the mid point of each local rise in intensity of this filtered signal. Here are the beats associated with syllable onsets for this file, displayed as points in Praat:

How to get the beats

I assume you are familiar with Praat, and that you have Perl installed, and that you can run perl programs from the command line.

Here is a little Praat script that will load in a specified sound file, will filter this using the above cut offs of 500 and 2,500 Hz, and will output a smooth intensity contour which can be used to estimate the beat locations:

low$ = "500"
high$ = "2500"

top$ = "/Users/fred/Projects/BeatMeasurement"
infile$ = "someSoundFile.wav"

name$ = infile$-".wav"

# Read in the file
 Read from file... 'top$'/'infile$'

# Scale the intensity
Scale intensity... 70

# Band pass filter
Filter (pass Hann band)... 'low$' 'high$' 100

# Convert to intensity
To Intensity... 25 0 yes

Write to short text file... 'top$'/'name$'.intensity

After running the above Praat script, you should have a file called "someSoundFile.intensity". You now need to run this perl script, which will estimate beat locations, and output those as a Praat-readable TextGrid file. Right click to save that script to your folder, and ensure that it is executable.

The script needs one or two arguments on the command line. The first, obligatory, argument is the name of the input file (without the ".intensity" suffix). So in the above example, you would minimally call the script thus:

getBeatsOneShot.pl someSoundFile

The second argument, which is optional, is a threshold value used to determine whether a local rise in intensity is big enough to justify the potential identification of a beat at that point. If the threshold is very low, the algorithm will identify lots of beats. A high value will ensure that only very large rises in intensity will be associated with beats. The threshold value must lie in the range 0-1, and a default of 0.2 is used if no value is provided. I find it best to use a lower value (say 0.01), which generates a lot of beats, and then to examine the beats in Praat, selecting those I am interested in keeping. This is a bit of work, but there is no foolproof method for doing this, and manual inspection of the result is always necessary.

If you want just the times of the beats, and not a Praat-readable file, you can use this Perl script to extract just the times, which you can use any way you see fit. To call it, you must provide an infile and an outfile name. For example:

extractBeatTimes.pl someSoundFile.beats someSoundFile.times

References

  1. Morton, J., S. Marcus, and C. Frankish (1976). Perceptual centers (P-centers). Psychological Review 83(5), 405-408.
  2. Marcus, S. M. (1981, September). Acoustic determinants of perceptual center (p-center) location. Perception & Psychophysics 30(3), 247-256.
  3. Scott, S. K. (1993) P-centres in speech: An acoustic analysis, PhD thesis, University College London.
  4. Cummins, F. and Port, R. F. (1998). Rhythmic constraints on stress timing in English. Journal of Phonetics, 26(2):145-171.