TY - GEN
T1 - Unspoken Sound
T2 - 2024 CHI Conference on Human Factors in Computing Sytems, CHI 2024
AU - May, Lloyd
AU - Ohshiro, Keita
AU - Dang, Khang
AU - Sridhar, Sripathi
AU - Pai, Jhanvi
AU - Fuentes, Magdalena
AU - Lee, Sooyeon
AU - Cartwright, Mark
N1 - Publisher Copyright:
© 2024 Copyright held by the owner/author(s)
PY - 2024/5/11
Y1 - 2024/5/11
N2 - High-quality closed captioning of both speech and non-speech elements (e.g., music, sound effects, manner of speaking, and speaker identification) is essential for the accessibility of video content, especially for d/Deaf and hard-of-hearing individuals. While many regions have regulations mandating captioning for television and movies, a regulatory gap remains for the vast amount of web-based video content, including the staggering 500+ hours uploaded to YouTube every minute. Advances in automatic speech recognition have bolstered the presence of captions on YouTube. However, the technology has notable limitations, including the omission of many non-speech elements, which are often crucial for understanding content narratives. This paper examines the contemporary and historical state of non-speech information (NSI) captioning on YouTube through the creation and exploratory analysis of a dataset of over 715k videos. We identify factors that influence NSI caption practices and suggest avenues for future research to enhance the accessibility of online video content.
AB - High-quality closed captioning of both speech and non-speech elements (e.g., music, sound effects, manner of speaking, and speaker identification) is essential for the accessibility of video content, especially for d/Deaf and hard-of-hearing individuals. While many regions have regulations mandating captioning for television and movies, a regulatory gap remains for the vast amount of web-based video content, including the staggering 500+ hours uploaded to YouTube every minute. Advances in automatic speech recognition have bolstered the presence of captions on YouTube. However, the technology has notable limitations, including the omission of many non-speech elements, which are often crucial for understanding content narratives. This paper examines the contemporary and historical state of non-speech information (NSI) captioning on YouTube through the creation and exploratory analysis of a dataset of over 715k videos. We identify factors that influence NSI caption practices and suggest avenues for future research to enhance the accessibility of online video content.
KW - closed captioning
KW - datasets
KW - extra-speech information
KW - non-speech information
KW - subtitles
UR - http://www.scopus.com/inward/record.url?scp=85194862559&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85194862559&partnerID=8YFLogxK
U2 - 10.1145/3613904.3642162
DO - 10.1145/3613904.3642162
M3 - Conference contribution
AN - SCOPUS:85194862559
T3 - Conference on Human Factors in Computing Systems - Proceedings
BT - CHI 2024 - Proceedings of the 2024 CHI Conference on Human Factors in Computing Sytems
PB - Association for Computing Machinery
Y2 - 11 May 2024 through 16 May 2024
ER -