TY - GEN
T1 - Scaper
T2 - 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2017
AU - Salamon, Justin
AU - MacConnell, Duncan
AU - Cartwright, Mark
AU - Li, Peter
AU - Bello, Juan Pablo
N1 - Publisher Copyright:
© 2017 IEEE.
PY - 2017/12/7
Y1 - 2017/12/7
N2 - Sound event detection (SED) in environmental recordings is a key topic of research in machine listening, with applications in noise monitoring for smart cities, self-driving cars, surveillance, bioa-coustic monitoring, and indexing of large multimedia collections. Developing new solutions for SED often relies on the availability of strongly labeled audio recordings, where the annotation includes the onset, offset and source of every event. Generating such precise annotations manually is very time consuming, and as a result existing datasets for SED with strong labels are scarce and limited in size. To address this issue, we present Scaper, an open-source library for soundscape synthesis and augmentation. Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, 'specification'. To increase the variability of the output, Scaper supports the application of audio transformations such as pitch shifting and time stretching individually to every event. To illustrate the potential of the library, we generate a dataset of 10,000 sound-scapes and use it to compare the performance of two state-of-The-Art algorithms, including a breakdown by soundscape characteristics. We also describe how Scaper was used to generate audio stimuli for an audio labeling crowdsourcing experiment, and conclude with a discussion of Scaper's limitations and potential applications.
AB - Sound event detection (SED) in environmental recordings is a key topic of research in machine listening, with applications in noise monitoring for smart cities, self-driving cars, surveillance, bioa-coustic monitoring, and indexing of large multimedia collections. Developing new solutions for SED often relies on the availability of strongly labeled audio recordings, where the annotation includes the onset, offset and source of every event. Generating such precise annotations manually is very time consuming, and as a result existing datasets for SED with strong labels are scarce and limited in size. To address this issue, we present Scaper, an open-source library for soundscape synthesis and augmentation. Given a collection of iso-lated sound events, Scaper acts as a high-level sequencer that can generate multiple soundscapes from a single, probabilistically defined, 'specification'. To increase the variability of the output, Scaper supports the application of audio transformations such as pitch shifting and time stretching individually to every event. To illustrate the potential of the library, we generate a dataset of 10,000 sound-scapes and use it to compare the performance of two state-of-The-Art algorithms, including a breakdown by soundscape characteristics. We also describe how Scaper was used to generate audio stimuli for an audio labeling crowdsourcing experiment, and conclude with a discussion of Scaper's limitations and potential applications.
KW - Soundscape
KW - sound event detection
KW - synthesis
UR - http://www.scopus.com/inward/record.url?scp=85042356698&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85042356698&partnerID=8YFLogxK
U2 - 10.1109/WASPAA.2017.8170052
DO - 10.1109/WASPAA.2017.8170052
M3 - Conference contribution
AN - SCOPUS:85042356698
T3 - IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
SP - 344
EP - 348
BT - 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, WASPAA 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 15 October 2017 through 18 October 2017
ER -