Cross-Granularity Graph Inference for Semantic Video Object Segmentation

Cross-Granularity Graph Inference for Semantic Video Object Segmentation

Huiling Wang, Tinghuai Wang, Ke Chen, Joni-Kristian Kämäräinen

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 4544-4550. https://doi.org/10.24963/ijcai.2017/634

We address semantic video object segmentation via a novel cross-granularity hierarchical graphical model to integrate tracklet and object proposal reasoning with superpixel labeling. Tracklet characterizes varying spatial-temporal relations of video object which, however, quite often suffers from sporadic local outliers. In order to acquire high-quality tracklets, we propose a transductive inference model which is capable of calibrating short-range noisy object tracklets with respect to long-range dependencies and high-level context cues. In the center of this work lies a new paradigm of semantic video object segmentation beyond modeling appearance and motion of objects locally, where the semantic label is inferred by jointly exploiting multi-scale contextual information and spatial-temporal relations of video object. We evaluate our method on two popular semantic video object segmentation benchmarks and demonstrate that it advances the state-of-the-art by achieving superior accuracy performance than other leading methods.
Keywords:
Robotics and Vision: Vision and Perception
Robotics and Vision: Robotics and Vision