OTOcc: Optimal Transport for Occupancy Prediction

Pengteng Li; Ying He; F. Richard Yu; Pinhao Song; Xingchen Zhou; Guang Zhou

doi:10.24963/ijcai.2024/112

OTOcc: Optimal Transport for Occupancy Prediction

Pengteng Li, Ying He, F. Richard Yu, Pinhao Song, Xingchen Zhou, Guang Zhou

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Main Track. Pages 1010-1019. https://doi.org/10.24963/ijcai.2024/112

PDF BibTeX

The autonomous driving community is highly interested in 3D occupancy prediction due to its outstanding geometric perception and object recognition capabilities. However, previous methods are limited to existing semantic conversion mechanisms for solving sparse ground truths problem, causing excessive computational demands and sub-optimal voxels representation. To tackle the above limitations, we propose OTOcc, a novel 3D occupancy prediction framework that models semantic conversion from 2D pixels to 3D voxels as Optimal Transport (OT) problem, offering accurate semantic mapping to adapt to sparse scenarios without attention or depth estimation. Specifically, the unit transportation cost between each demander (voxel) and supplier (pixel) pair is defined as the weighted occupancy prediction loss. Then, we utilize the Sinkhorn-Knopp Iteration to find the best mapping matrices with minimal transportation costs. To reduce the computational cost, we propose a block reading technique with multi-perspective feature representation, which also brings fine-grained scene understanding. Extensive experiments show that OTOcc not only has the competitive prediction performance but also has about more than 4.58% reduction in computational overhead compared to state-of-the-art methods.

Keywords:

Computer Vision: CV: 3D computer vision

Computer Vision: CV: Applications

Computer Vision: CV: Machine learning for vision