Generic person tracking is a basic task in visual surveillance by using camera as sensors. Many deep learning-based trackers have obtained outstanding performance. Among them, trackers based on Siamese networks have drawn great attention and are promising. These training methods are competitive but training data can be more effectively augmented to improve their person tracking performance. Many other trackers use only one layer to extract semantic features, likely hindering their discriminative learning. In this paper, we propose an enhanced discriminative model prediction method with efficient data augmentation and robust feature fusion. Specifically, we propose to implement an effective data augmentation strategy (e.g., color jitter and motion blur) to unleash the greater potential of original training data. We also adopt a multi-layer feature fusion to obtain a more discriminative feature map. Thus, the proposed tracker can discriminate an object in complicated scenarios in real time. We conduct extensive experiments on two datasets, i.e., VOT2018 and UAV123. Objective evaluation on VOT2018 demonstrates that with its expected average overlap value of 0.430, it outperforms a state-of-the-art tracker by 4.88%. On UAV123, it does so by 4.5% in success rate and 4.4% in precision rate. In addition, our further experimental results reveal that our algorithm can reach a speed that is high enough to meet the real-time tracking requirement when camera are used as sensors.
All Science Journal Classification (ASJC) codes
- Electrical and Electronic Engineering
- Model prediction
- data augmentation
- feature fusion
- person tracking
- visual sensing