TY - GEN
T1 - Crowdsourced Pairwise-Comparison for Source Separation Evaluation
AU - Cartwright, Mark
AU - Pardo, Bryan
AU - Mysore, Gautham J.
N1 - Publisher Copyright:
© 2018 IEEE.
PY - 2018/9/10
Y1 - 2018/9/10
N2 - Automated objective methods of audio source separation evaluation are fast, cheap, and require little effort by the investigator. However, their output often correlates poorly with human quality assessments and typically require ground-truth (perfectly separated) signals to evaluate algorithm performance. Subjective multi-stimulus human ratings (e.g. MUSHRA) of audio quality are the gold standard for many tasks, but they are slow and require a great deal of effort to recruit participants and run listening tests. Recent work has shown that a crowdsourced multi-stimulus listening test can have results comparable to lab-based multi-stimulus tests. While these results are encouraging, MUSHRA multi-stimulus tests are limited to evaluating 12 or fewer stimuli, and they require ground-truth stimuli for reference. In this work, we evaluate a web-based pairwise-comparison listening approach that promises to speed and facilitate conducting listening tests, while also addressing some of the shortcomings of multi-stimulus tests. Using audio source separation quality as our evaluation task, we compare our web-based pairwise-comparison listening test to both web-based and lab-based multi-stimulus tests. We find that pairwise-comparison listening tests perform comparably to multi-stimulus tests, but without many of their shortcomings.
AB - Automated objective methods of audio source separation evaluation are fast, cheap, and require little effort by the investigator. However, their output often correlates poorly with human quality assessments and typically require ground-truth (perfectly separated) signals to evaluate algorithm performance. Subjective multi-stimulus human ratings (e.g. MUSHRA) of audio quality are the gold standard for many tasks, but they are slow and require a great deal of effort to recruit participants and run listening tests. Recent work has shown that a crowdsourced multi-stimulus listening test can have results comparable to lab-based multi-stimulus tests. While these results are encouraging, MUSHRA multi-stimulus tests are limited to evaluating 12 or fewer stimuli, and they require ground-truth stimuli for reference. In this work, we evaluate a web-based pairwise-comparison listening approach that promises to speed and facilitate conducting listening tests, while also addressing some of the shortcomings of multi-stimulus tests. Using audio source separation quality as our evaluation task, we compare our web-based pairwise-comparison listening test to both web-based and lab-based multi-stimulus tests. We find that pairwise-comparison listening tests perform comparably to multi-stimulus tests, but without many of their shortcomings.
KW - Audio quality evaluation
KW - Crowdsourcing
KW - Source separation
UR - http://www.scopus.com/inward/record.url?scp=85054278334&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85054278334&partnerID=8YFLogxK
U2 - 10.1109/ICASSP.2018.8462153
DO - 10.1109/ICASSP.2018.8462153
M3 - Conference contribution
AN - SCOPUS:85054278334
SN - 9781538646588
T3 - ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings
SP - 606
EP - 610
BT - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018 - Proceedings
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2018
Y2 - 15 April 2018 through 20 April 2018
ER -