Increasing drum transcription vocabulary using data synthesis

Mark Cartwright, Juan Pablo Bello

Research output: Contribution to journalConference articlepeer-review

14 Scopus citations


Current datasets for automatic drum transcription (ADT) are small and limited due to the tedious task of annotating onset events. While some of these datasets contain large vocabularies of percussive instrument classes (e.g. ~20 classes), many of these classes occur very infrequently in the data. This paucity of data makes it difficult to train models that support such large vocabularies. Therefore, data-driven drum transcription models often focus on a small number of percussive instrument classes (e.g. 3 classes). In this paper, we propose to support large-vocabulary drum transcription by generating a large synthetic dataset (210,000 eight second examples) of audio examples for which we have ground-truth transcriptions. Using this synthetic dataset along with existing drum transcription datasets, we train convolutional-recurrent neural networks (CRNNs) in a multi-task framework to support large-vocabulary ADT. We find that training on both the synthetic and real music drum transcription datasets together improves performance on not only large-vocabulary ADT, but also beat / downbeat detection small-vocabulary ADT.

Original languageEnglish (US)
Pages (from-to)72-79
Number of pages8
JournalProceedings of the International Conference on Digital Audio Effects, DAFx
StatePublished - 2018
Externally publishedYes
Event21st International Conference on Digital Audio Effects, DAFx 2018 - Aveiro, Portugal
Duration: Sep 4 2018Sep 8 2018

All Science Journal Classification (ASJC) codes

  • Computer Science Applications
  • Signal Processing
  • Music


Dive into the research topics of 'Increasing drum transcription vocabulary using data synthesis'. Together they form a unique fingerprint.

Cite this