Cutting the Black Box: Conceptual Interpretation of a Deep Neural Net with Multi-Modal Embeddings and Multi-Criteria Decision Aid

Nicolas Atienza; Roman Bresson; Cyriaque Rousselot; Philippe Caillou; Johanne Cohen; Christophe Labreuche; Michele Sebag

doi:10.24963/ijcai.2024/406

Cutting the Black Box: Conceptual Interpretation of a Deep Neural Net with Multi-Modal Embeddings and Multi-Criteria Decision Aid

Nicolas Atienza, Roman Bresson, Cyriaque Rousselot, Philippe Caillou, Johanne Cohen, Christophe Labreuche, Michele Sebag

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence

Main Track. Pages 3669-3678. https://doi.org/10.24963/ijcai.2024/406

PDF BibTeX

This paper tackles the concept-based explanation of neural models in computer vision, building upon the state of the art in Multi-Criteria Decision Aid (MCDA). The novelty of the approach is to leverage multi-modal embeddings from CLIP to bridge the gap between pixel-based and concept-based representations. The proposed Cut the Black Box (CB2) approach disentangles the latent representation of a trained pixel-based neural net, referred to as teacher model, along a 3-step process. Firstly, the pixel-based representation of the samples is mapped onto a conceptual representation using multi-modal embeddings. Secondly, an interpretable-by-design MCDA student model is trained by distillation from the teacher model, using the conceptual sample representation. Thirdly, the alignment of the teacher and student latent representations spells out the concepts relevant to explaining the teacher model. The empirical validation of the approach on ResNet, VGG, and VisionTransformer on Cifar-10, Cifar-100, Tiny ImageNet, and Fashion-MNIST showcases the effectiveness of the interpretations provided for the teacher models. The analysis reveals that decision-making predominantly relies on few concepts, thereby exposing potential bias in the teacher's decisions.

Keywords:

Machine Learning: ML: Explainable/Interpretable machine learning

Computer Vision: CV: Interpretability and transparency

Knowledge Representation and Reasoning: KRR: Preference modelling and preference-based reasoning

Machine Learning: ML: Trustworthy machine learning