The drum machine of Tao ☯︎

❍ What we have done:

In this work we present Tao , a "symmetric" drum machine capable of both synthesizing audio waveforms from sequencer parameters as well as inferring sequencer parameters from audio waveforms.

Implementation for "sequencer parameters → audio waveforms" synthesis is not a standing problem; the main challenge of Tao lies in the reverse direction—"audio waveforms → sequencer parameter"—which we refer to as the sequencer parameter estimation problem. We leverage machine learning to assist with this; see the paper for technical details.

❍ Why:

Recovering sequencer parameters from a sampled drum loop in audio waveform can restore low-level editability to loops that would otherwise remain frozen as audio. The philosophy behind this system draws inspiration from Taoism: that which returns to its primal state follows the great Way of Tao.

❍ btw:

Sequencer parameters (minimal):

a global tempo
step vectors per each percussive stem (e.g. a step vector for the kick drum may look like [1,0,0,0,1,0,0,0])
one-shot samples per each percussive stem

❍ On another note,

While the interface of Tao is still under construction, you are invited to imagine an interactive web-based regular drum machine + sequencer interface which probably looks like this, but with an additional ⇪drum loop audio file upload⇪ component.

❍ Finally:

See below for examples of sequencer parameters recovered by Tao from input drum loops. Desktop recommended for optimal display.

✻ Example I:

input:

a drum loop

Tao output:

	one-shot sample	step vector
est. tempo:	190



🀙 kick:		● ○ ○ ○ ● ○ ○ ○
🀄︎ snare:		○ ○ ○ ○ ● ● ○ ○
🀑 hihats:		○ ● ● ● ● ● ● ●
reconstruction

"reconstruction" is the drum loop audio synthesized using the estimated sequencer parameters, provided for a quick assessment on the estimation quality.

✻✻ Example II:

input:

a drum loop

Tao output:

	one-shot sample	step vector
est. tempo:	71



🀙 kick:		● ○ ● ● ○ ○ ○ ○
🀄︎ snare:		● ● ● ● ○ ● ● ●
🀑 hihats:		● ● ○ ○ ● ● ○ ○
reconstruction

✻✻✻ Example III:

input:

a drum loop

Tao output:

	one-shot sample	step vector
est. tempo:	159



🀙 kick:		● ○ ○ ● ○ ○ ○ ○
🀄︎ snare:		● ○ ● ● ○ ○ ● ●
🀑 hihats:		● ● ● ● ● ● ● ●
reconstruction

✻✻✻✻ Example IV:

input:

a drum loop

Tao output:

	one-shot sample	step vector
est. tempo:	100



🀙 kick:		● ● ● ● ● ● ● ●
🀄︎ snare:		○ ○ ○ ● ○ ○ ○ ●
🀑 hihats:		● ● ● ● ● ● ● ●
reconstruction

✻✻✻✻✻ Example V:

input:

a drum loop

Tao output:

	one-shot sample	step vector
est. tempo:	180



🀙 kick:		● ○ ○ ○ ● ○ ○ ○
🀄︎ snare:		● ● ○ ● ○ ○ ● ●
🀑 hihats:		● ● ● ● ● ○ ● ●
reconstruction

There are different computational metrics for different components as each component in Tao handles a different subproblem.

We have synthesized two testing drum loops sets fom Freesound One-Shot Percussive Sounds - a random-rhythm testing set and a prior-rhythm testing set.

Random-rhythm testing set (Random): the testing set synthesized using randomly sampled step vectors following no prior rhythmic patterns.

Prior-rhythm testing set (Prior): the testing set synthesized using prior step vector collections (one collection for each percussive track), which are prepared by manual annotation from invited professional drummers and producers.

All one-shot samples in the testing set were unseen during training.

1. Evaluation results of the drum source separation model
2. Evaluation results of the tempo estimation
3. Evaluation results of the step vector estimation
4. Evaluation results of the one-shot sample extraction

1. Evaluation results of the drum source separation model

We adopt the commonly used Music Source Separation evaluation metrics including the Signal-to-Distortion Ratio (SDR) and the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR).

Metrics (in dB)	kick	snare	hihats
SDR (Prior)	17.09	7.32	4.97
SI-SDR (Prior)	15.34	6.00	2.77
SDR (Random)	15.84	8.40	3.60
SI-SDR (Random)	14.33	7.37	0.43

2. Evaluation results of the tempo estimation

We adopt accuracy metric which is computed if the estimated tempo is within 1% of the groundtruth tempo.

Metrics
Accuracy (Prior)	0.995
Accuracy (Random)	0.999

3. Evaluation results of the step vector estimation

For evaluating the estimated step vectors, we propose to use recall, precision and F-measure metrics considering it being a multi-label binary classification task.

Metrics	kick	snare	hihats
F-measure(Prior)	0.903	0.770	0.861
F-measure(Random)	0.908	0.860	0.813

4. Evaluation results of the one-shot sample extraction

for evaluating the quality of the extracted one-shot sample waveforms compared to the groundtruth ones used for loop synthesis, we propose to use the SDR and the SI-SDR metrics.

Metrics (in dB)	kick	snare	hihats
SDR (Prior)	48.21	15.23	29.29
SI-SDR (Prior)	40.62	20.31	20.31(?)
SDR (Random)	37.80	33.35	25.36
SI-SDR (Random)	31.47	21.98	15.42