Image Retrieval with Self-Supervised Divergence Minimization and Cross-Attention Classification

Vivek Trivedy; Longin Jan Latecki

doi:10.24963/ijcai.2024/149

Image Retrieval with Self-Supervised Divergence Minimization and Cross-Attention Classification

Vivek Trivedy, Longin Jan Latecki

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Main Track. Pages 1344-1352. https://doi.org/10.24963/ijcai.2024/149

PDF BibTeX

Common approaches to image retrieval include contrastive methods and specialized loss functions such as ranking losses and entropy regularizers. We present DMCAC (Divergence Minimization with Cross-Attention Classification), a novel image retrieval method that offers a new perspective on this training paradigm. We use self-supervision with a novel divergence loss framework alongside a simple data flow adjustment that minimizes a distribution over a database directly during training. We show that jointly learning a query representation over a database is a competitive and often improved alternative to traditional contrastive methods for image retrieval. We evaluate our method across several model configurations and four datasets, achieving state-of-the-art performance in multiple settings. We also conduct a thorough set of ablations that show the robustness of our method across full vs. approximate retrieval and different hyperparameter configurations.

Keywords:

Computer Vision: CV: Image and video retrieval

Computer Vision: CV: Representation learning