Single-Image 3D Scene Parsing Using Geometric Commonsense

Single-Image 3D Scene Parsing Using Geometric Commonsense

Chengcheng Yu, Xiaobai Liu, Song-Chun Zhu

Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence
Main track. Pages 4655-4661. https://doi.org/10.24963/ijcai.2017/649

This paper presents a unified grammatical framework capable of reconstructing a variety of scene types (e.g., urban, campus, county etc.) from a single input image. The key idea of our approach is to study a novel commonsense reasoning framework that mainly exploits two types of prior knowledges: (i) prior distributions over a single dimension of objects, e.g., that the length of a sedan is about 4.5 meters; (ii) pair-wise relationships between the dimensions of scene entities, e.g., that the length of a sedan is shorter than a bus. These unary or relative geometric knowledge, once extracted, are fairly stable across different types of natural scenes, and are informative for enhancing the understanding of various scenes in both 2D images and 3D world. Methodologically, we propose to construct a hierarchical graph representation as a unified representation of the input image and related geometric knowledge. We formulate these objectives with a unified probabilistic formula and develop a data-driven Monte Carlo method to infer the optimal solution with both bottom-to-up and top-down computations. Results with comparisons on public datasets showed that our method clearly outperforms the alternative methods.
Keywords:
Uncertainty in AI: Bayesian Networks
Robotics and Vision: Vision and Perception