Making an automatic chord recogniser

Tue 13 September 2016

I've started working on an automatic chord recogniser for audio. It's something I've wanted to try out for a while but hadn't found the time until recently! It seems like a neat project :) I'm still in the beginnings, but I'm going to talk about what I have so far.

What is a chromagram?

The feature extraction technique I'm starting with comes from a 1999 paper called "Realtime Chord Recognition of Musical Sound". Nineteen-ninety-nine was a long time ago, but the features are really intuitive and involve some fun simple musicology and DSP. I think it's a really nice place to start – I might try some more exciting (read: time-consuming and complicated) techniques later on, once I've got my teeth into things more.

In Western tonal music, we divide frequencies into semitones which follow a logarithmic scale. Every 12 semitones, we double frequency. A group of such 12 tones, or that distance, is called an octave.

An octave on a
piano

The chromagram feature is a 12 dimensional vector, with each entry giving the relative intensity of a chroma, e.g. A or C#. Chroma simply refers to a note regardless of octave. It's a nice name! Chromagrams are also known as Pitch Class Profiles, but I thought the former was much cooler :D

How do we calculate chromagrams?

Time for some fun musicology and DSP. To calculate a chromagram, we want to know which chroma are present in a time-sample, and their intensity. This is a perfect job for a Fourier Transform. We take the DFT, then, in short, put the DFT bins into chroma bins. We effectively sum the DFT components that correspond to chroma frequencies (or come close). Here's the maths:

$$ M_k = \left[12 \log_2\left(\frac{f_s k}{Nf_{\textrm{ref}}}\right) \mod 12 \right] $$ $$ C_c = \sum_{M_k = c} \left|{X_k}\right|^2 $$

$M_k$ tells us the closest chroma to the DFT frequency bin:

$f_{\textrm{ref}}$ is the reference frequency of the 0th chroma (e.g. 27.5 Hz for the lowest A)
The frequency represented by the bin is easily found by the product of the sampling frequency and the bin index, divided by the DFT length: $f_{\textrm{bin}} = \frac{f_s k}{Nf_{\textrm{ref}}}$
$\log_2(\frac{f_{\textrm{bin}}}{f_{\textrm{ref}}})$ gives how many octaves the bin frequency is above the chroma reference frequency
$ 12 \log_2(\frac{f_{\textrm{bin}}}{f_{\textrm{ref}}})$ gives how many semitones the bin frequency is above the chroma reference frequency
Rounding this and taking the mod 12 gives the nearest chroma index

$X_k$ is the DFT, and $C_c$ tells us the intensity of the chroma $c$ by summing DFT components where the spectral bins land in the chroma bins.

My project

I currently have a functioning chromagrammer in Python 3, along with some generic audio processing, and a script for generating tones and chords for testing. There are some issues, and I've had some silly bugs along the way, but I'm enjoying it. Here is one of my favourite pieces of code. All it does is produce overlapping frames of a signal, but it's very neat :D

 1 def overlapping_frames(self):
 2     """
 3     Generates overlapping frames
 4 
 5     Generator that yields a deque containing the current frame of audio
 6     data.  The deque contents is shifted by the frame size minus the
 7     overlap on each iteration, to minimise computation.
 8     """
 9     frame = collections.deque(maxlen=self.frame_size)
10     frame.extend(self.data[:self.frame_size])
11     yield frame
12 
13     for i in range(1, self.num_frames):
14         frame.extend(self.data[i * self.frame_size - self.overlap:
15                                (i + 1) * self.frame_size - self.overlap])
16         yield frame

I'm using hypothesis for property based testing, which I'm really getting into. Here's an example test, which actively finds example inputs that will break the test. My tests are so much more useful with this!

1 @hypothesis.given(arrays(float, 100))
2 def test_overlapping_frames_yields_correct_initial_frame(self, data):
3     self.ap.data = np.nan_to_num(data)
4     self.ap.process_data()
5 
6     frames = self.ap.overlapping_frames()
7     assert (next(frames) == self.ap.data[:self.ap.frame_size]).all()

It's not a very exciting test, but I can be confident my code works for a variety of values, and it saves me having to make up dummy data.

Seaborn for pretty graphs!

I've started using seaborn for plotting my graphs, rather than plain old matplotlib. It's amazing! Everything looks so pretty. Here's part of a Mountain Goats song's chromagram plotted for a bunch of time samples. Nice!

Pretty chromagram ;o

I want to try and update my blog with how my project is going. I'm really enjoying it and I'm excited about the different ways it can go. There's a lot of scope for some cool Machine Learning and DSP! I really like the neat intuitive features I'm using at the moment, but I'm excited about what else I could use :D You can check out the code here at the github repo. ::: :::

lochsh

Making an automatic chord recogniser

What is a chromagram?

How do we calculate chromagrams?

My project

Seaborn for pretty graphs!