Online Hybrid Lightweight Representations Learning: Its Application to Visual Tracking

Online Hybrid Lightweight Representations Learning: Its Application to Visual Tracking

Ilchae Jung, Minji Kim, Eunhyeok Park, Bohyung Han

Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 1002-1008. https://doi.org/10.24963/ijcai.2022/140

This paper presents a novel hybrid representation learning framework for streaming data, where an image frame in a video is modeled by an ensemble of two distinct deep neural networks; one is a low-bit quantized network and the other is a lightweight full-precision network. The former learns coarse primary information with low cost while the latter conveys residual information for high fidelity to original representations. The proposed parallel architecture is effective to maintain complementary information since fixed-point arithmetic can be utilized in the quantized network and the lightweight model provides precise representations given by a compact channel-pruned network. We incorporate the hybrid representation technique into an online visual tracking task, where deep neural networks need to handle temporal variations of target appearances in real-time. Compared to the state-of-the-art real-time trackers based on conventional deep neural networks, our tracking algorithm demonstrates competitive accuracy on the standard benchmarks with a small fraction of computational cost and memory footprint.
Keywords:
Computer Vision: Representation Learning
Computer Vision: Applications
Computer Vision: Machine Learning for Vision
Computer Vision: Motion and Tracking