Detecting Out-Of-Context Objects Using Graph Contextual Reasoning Network
Detecting Out-Of-Context Objects Using Graph Contextual Reasoning Network
Manoj Acharya, Anirban Roy, Kaushik Koneripalli, Susmit Jha, Christopher Kanan, Ajay Divakaran
Proceedings of the Thirty-First International Joint Conference on Artificial Intelligence
Main Track. Pages 629-635.
https://doi.org/10.24963/ijcai.2022/89
This paper presents an approach for detecting out-of-context (OOC) objects in images. Given an image with a set of objects, our goal is to determine if an object is inconsistent with the contextual relations and detect the OOC object with a bounding box. In this work, we consider common contextual relations such as co-occurrence relations, the relative size of an object with respect to other objects, and the position of the object in the scene. We posit that contextual cues are useful to determine object labels for in-context objects and inconsistent context cues are detrimental to determining object labels for out-of-context objects. To realize this hypothesis, we propose a graph contextual reasoning network (GCRN) to detect OOC objects. GCRN consists of two separate graphs to predict object labels based on the contextual cues in the image: 1) a representation graph to learn object features based on the neighboring objects and 2) a context graph to explicitly capture contextual cues from the neighboring objects. GCRN explicitly captures the contextual cues to improve the detection of in-context objects and identify objects that violate contextual relations.
In order to evaluate our approach, we create a large-scale dataset by adding OOC object instances to the COCO images. We also evaluate on recent OCD benchmark. Our results show that GCRN outperforms competitive baselines in detecting OOC objects and correctly detecting in-context objects. Code and data: https://nusci.csl.sri.com/project/trinity-ooc
Keywords:
AI Ethics, Trust, Fairness: Trustworthy AI
Computer Vision: Recognition (object detection, categorization)
Computer Vision: Scene analysis and understanding