open access publication

Article, 2024

Supervised feature compression based on counterfactual analysis

European Journal of Operational Research, ISSN 1872-6860, 0377-2217, Volume 317, 2, Pages 273-285, 10.1016/j.ejor.2023.11.019

Contributors

Piccialli, Veronica 0000-0002-3357-9608 [1] Morales, Dolores Romero 0000-0001-7945-1469 [2] Salvatore, Cecilia 0000-0003-3758-5252 (Corresponding author) [3]

Affiliations

  1. [1] Sapienza University of Rome
  2. [NORA names: Italy; Europe, EU; OECD];
  3. [2] Copenhagen Business School
  4. [NORA names: CBS Copenhagen Business School; University; Denmark; Europe, EU; Nordic; OECD];
  5. [3] University of Rome Tor Vergata
  6. [NORA names: Italy; Europe, EU; OECD]

Abstract

Counterfactual Explanations are becoming a de-facto standard in post-hoc interpretable machine learning. For a given classifier and an instance classified in an undesired class, its counterfactual explanation corresponds to small perturbations of that instance that allows changing the classification outcome. This work aims to leverage Counterfactual Explanations to detect the important decision boundaries of a pre-trained black-box model. This information is used to build a supervised discretization of the features in the dataset with a tunable granularity. Using the discretized dataset, an optimal Decision Tree can be trained that resembles the black-box model, but that is more interpretable and compact. Numerical results on real-world datasets show the effectiveness of the approach in terms of accuracy and sparsity.

Keywords

accuracy, analysis, black-box, black-box models, boundaries, class, classification, classification outcomes, classifier, compression, counterfactual analysis, counterfactual explanations, dataset, de-facto standard, decision, decision boundary, decision tree, discrete datasets, discretization, effect, explanation, feature compression, features, granularity, information, learning, machine, machine learning, model, numerical results, optimal decision tree, outcomes, perturbation, real-world datasets, results, sparsity, standards, supervised discretization, trees, tunable granularity, undesirable class

Funders

  • European Commission

Data Provider: Digital Science