Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Interpretable Network Visualizations: A Human-in-the-Loop Approach for Post-hoc Explainability of CNN-based Image Classification

Matteo Bianchi, Antonio De Santis, Andrea Tocchetti, Marco Brambilla

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 3715-3723. https://doi.org/10.24963/ijcai.2024/411

Transparency and explainability in image classification are essential for establishing trust in machine learning models and detecting biases and errors. State-of-the-art explainability methods generate saliency maps to show where a specific class is identified, without providing a detailed explanation of the model's decision process. Striving to address such a need, we introduce a post-hoc method that explains the entire feature extraction process of a Convolutional Neural Network. These explanations include a layer-wise representation of the features the model extracts from the input. Such features are represented as saliency maps generated by clustering and merging similar feature maps, to which we associate a weight derived by generalizing Grad-CAM for the proposed methodology. To further enhance these explanations, we include a set of textual labels collected through a gamified crowdsourcing activity and processed using NLP techniques and Sentence-BERT. Finally, we show an approach to generate global explanations by aggregating labels across multiple images.
Keywords:
Machine Learning: ML: Explainable/Interpretable machine learning
Computer Vision: CV: Interpretability and transparency
Humans and AI: HAI: Human computation and crowdsourcing