CMACE: CMAES-based Counterfactual Explanations for Black-box Models

CMACE: CMAES-based Counterfactual Explanations for Black-box Models

Xudong Yin, Yao Yang

Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence
Main Track. Pages 539-547. https://doi.org/10.24963/ijcai.2024/60

Explanatory Artificial Intelligence plays a vital role in machine learning, due to its widespread application in decision-making scenarios, e.g., credit lending. Counterfactual Explanation (CFE) is a new kind of explanatory method that involves asking “what if ”, i.e. what would have happened if model inputs slightly change. To answer the question, Counterfactual Explanation aims at finding a minimum perturbation in model inputs leading to a different model decision. Compared with model-agnostic approaches, model-specific CFE approaches designed only for specific type of models usually have better performance in finding optimal counterfactual perturbations, owing to access to the inner workings of models. To deal with this dilemma, this work first proposes CMAES-based Counterfactual Explanations (CMACE): an effective model-agnostic counterfactual generating approach based on Covariance Matrix Adaptation Evolution Strategy (CMA-ES) and a warm starting scheme that provides good initialization of the counterfactual's mean and covariance parameters for CMA-ES taking advantage of prior information of training samples. CMACE significantly outperforms another state-of-art (SOTA) model-agnostic approach (Bayesian Counterfactual Generator, BayCon) with various experimental settings. Extensive experiments also demonstrate that CMACE is superior to a SOTA model-specific approach (Flexible Optimizable Counterfactual Explanations for Tree Ensembles, FOCUS) that is designed for tree-based models using gradient-based optimization.
Keywords:
AI Ethics, Trust, Fairness: ETF: Explainability and interpretability
AI Ethics, Trust, Fairness: ETF: Trustworthy AI
Machine Learning: ML: Optimization
Machine Learning: ML: Trustworthy machine learning