TY - JOUR
T1 - Self-attention Pooling-based Long-term Temporal Network for Action Recognition
AU - Li, Huifang
AU - Huang, Jingwei
AU - Zhou, Mengchu
AU - Shi, Qisong
AU - Fei, Qing
N1 - Publisher Copyright:
IEEE
PY - 2022
Y1 - 2022
N2 - With the development of Internet of Things (IoT), self-driving technology has been successful. Yet safe driving faces challenges due to such cases as pedestrians crossing roads. How to sense their movements and identify their behaviors from video data is important. Most of existing methods fail to: a) capture long-term temporal relationship well due to their limited temporal coverage, and b) aggregate discriminative representation effectively, such as caused by little or even no attention paid to differences among representations. To address such issues, this work presents a new architecture called a Self-attention Pooling-based Long-term Temporal Network (SPLTN), which can learn long-term temporal representations and aggregate those discriminative representations in an end-to-end manner, and on the other hand, effectively conduct long-term representation learning on a given video by capturing spatial information and mining temporal patterns. Next, it develops a self-attention pooling method to predict the importance scores of obtained representations for distinguishing them from each other, and then weights them together to highlight the contributions of those discriminative representations in action recognition. Finally, it gives a new loss function that combines a standard crossentropy loss function with a regularization term to further focus on the discriminative representations while restraining the impact of distractive ones on activity classification. Experimental results on two datasets show that our SP-LTN, fed by only Red-Green-Blue (RGB) frames outperforms the state-of-the-art methods.
AB - With the development of Internet of Things (IoT), self-driving technology has been successful. Yet safe driving faces challenges due to such cases as pedestrians crossing roads. How to sense their movements and identify their behaviors from video data is important. Most of existing methods fail to: a) capture long-term temporal relationship well due to their limited temporal coverage, and b) aggregate discriminative representation effectively, such as caused by little or even no attention paid to differences among representations. To address such issues, this work presents a new architecture called a Self-attention Pooling-based Long-term Temporal Network (SPLTN), which can learn long-term temporal representations and aggregate those discriminative representations in an end-to-end manner, and on the other hand, effectively conduct long-term representation learning on a given video by capturing spatial information and mining temporal patterns. Next, it develops a self-attention pooling method to predict the importance scores of obtained representations for distinguishing them from each other, and then weights them together to highlight the contributions of those discriminative representations in action recognition. Finally, it gives a new loss function that combines a standard crossentropy loss function with a regularization term to further focus on the discriminative representations while restraining the impact of distractive ones on activity classification. Experimental results on two datasets show that our SP-LTN, fed by only Red-Green-Blue (RGB) frames outperforms the state-of-the-art methods.
KW - Action Recognition
KW - Deep Learning
KW - Feature extraction
KW - Internet of Things
KW - Long-term Temporal Networks
KW - Predictive models
KW - Representation learning
KW - Roads
KW - Self-attention Pooling.
KW - Three-dimensional displays
KW - Videos
UR - http://www.scopus.com/inward/record.url?scp=85123784254&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123784254&partnerID=8YFLogxK
U2 - 10.1109/TCDS.2022.3145839
DO - 10.1109/TCDS.2022.3145839
M3 - Article
AN - SCOPUS:85123784254
JO - IEEE Transactions on Cognitive and Developmental Systems
JF - IEEE Transactions on Cognitive and Developmental Systems
SN - 2379-8920
ER -