TY - GEN
T1 - Visualization of Speech Prosody and Emotion in Captions
T2 - 2023 CHI Conference on Human Factors in Computing Systems, CHI 2023
AU - De Lacerda Pataca, Caluã
AU - Watkins, Matthew
AU - Peiris, Roshan
AU - Lee, Sooyeon
AU - Huenerfauth, Matt
N1 - Publisher Copyright:
© 2023 ACM.
PY - 2023/4/19
Y1 - 2023/4/19
N2 - Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (dhh) individuals to understand if and how captions' inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 dhh participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.
AB - Speech is expressive in ways that caption text does not capture, with emotion or emphasis information not conveyed. We interviewed eight Deaf and Hard-of-Hearing (dhh) individuals to understand if and how captions' inexpressiveness impacts them in online meetings with hearing peers. Automatically captioned speech, we found, lacks affective depth, lending it a hard-to-parse ambiguity and general dullness. Interviewees regularly feel excluded, which some understand is an inherent quality of these types of meetings rather than a consequence of current caption text design. Next, we developed three novel captioning models that depicted, beyond words, features from prosody, emotions, and a mix of both. In an empirical study, 16 dhh participants compared these models with conventional captions. The emotion-based model outperformed traditional captions in depicting emotions and emphasis, with only a moderate loss in legibility, suggesting its potential as a more inclusive design for captions.
KW - Accessibility
KW - Emotion / Affective Computing
KW - Empirical study that tells us about how people use a system
KW - Individuals with Disabilities & Assistive Technologies
UR - http://www.scopus.com/inward/record.url?scp=85160009689&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85160009689&partnerID=8YFLogxK
U2 - 10.1145/3544548.3581511
DO - 10.1145/3544548.3581511
M3 - Conference contribution
AN - SCOPUS:85160009689
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2023 - Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems
PB - Association for Computing Machinery
Y2 - 23 April 2023 through 28 April 2023
ER -