Permutation explainer

This notebooks demonstrates how to use the Permutation explainer on some simple datasets. The Permutation explainer is model-agnostic, so it can compute Shapley values and Owen values for any model. It works by iterating over complete permutations of the features forward and the reversed. By doing this, changing one feature at a time we can minimize the number of model evaluations that are required, and always ensure we satisfy efficiency no matter how many executions of the original model we choose to use for appoximation the feature attribution values. So the SHAP values computed, while approximate, do exactly sum up to the difference between the base value of the model and the output of the model for each explained instance.

Because the Permutation explainer has important performance optimizations, and does not require regularization parameter tuning like Kernel explainer, the Permutation explainer is the default model agnostic explainer used for tabular datasets that have more features than would be appropriate for the Exact explainer.

Below we domonstrate how to use the Permutation explainer on a simple adult income classification dataset and model.

[1]:
import xgboost

import shap

# get a dataset on income prediction
X, y = shap.datasets.adult()

# train an XGBoost model (but any other model type would also work)
model = xgboost.XGBClassifier()
model.fit(X, y);

Tabular data with independent (Shapley value) masking

[2]:
# build a Permutation explainer and explain the model predictions on the given dataset
explainer = shap.explainers.Permutation(model.predict_proba, X)
shap_values = explainer(X[:100])

# get just the explanations for the positive class
shap_values = shap_values[..., 1]

Plot a global summary

[3]:
shap.plots.bar(shap_values)
../../../_images/example_notebooks_api_examples_explainers_Permutation_5_0.png

Plot a single instance

[4]:
shap.plots.waterfall(shap_values[0])
../../../_images/example_notebooks_api_examples_explainers_Permutation_7_0.png

Tabular data with partition (Owen value) masking

While Shapley values result from treating each feature independently of the other features, it is often useful to enforce a structure on the model inputs. Enforcing such a structure produces a structure game (i.e. a game with rules about valid input feature coalitions), and when that structure is a nest set of feature grouping we get the Owen values as a recursive application of Shapley values to the group. In SHAP, we take the partitioning to the limit and build a binary herarchial clustering tree to represent the structure of the data. This structure could be chosen in many ways, but for tabular data it is often helpful to build the structure from the redundancy of information between the input features about the output label. This is what we do below:

[5]:
# build a clustering of the features based on shared information about y
clustering = shap.utils.hclust(X, y)
[6]:
# above we implicitly used shap.maskers.Independent by passing a raw dataframe as the masker
# now we explicitly use a Partition masker that uses the clustering we just computed
masker = shap.maskers.Partition(X, clustering=clustering)

# build a Permutation explainer and explain the model predictions on the given dataset
explainer = shap.explainers.Permutation(model.predict_proba, masker)
shap_values2 = explainer(X[:100])

# get just the explanations for the positive class
shap_values2 = shap_values2[..., 1]

Plot a global summary

Note that only the Relationship and Marital status features share more that 50% of their explanation power (as measured by R2) with each other, so all the other parts of the clustering tree are removed by the the default clustering_cutoff=0.5 setting:

[7]:
shap.plots.bar(shap_values2)
../../../_images/example_notebooks_api_examples_explainers_Permutation_12_0.png

Plot a single instance

Note that there is a strong similarity between the explanation from the Independent masker above and the Partition masker here. In general the distinctions between these methods for tabular data are not large, though the Partition masker allows for much faster runtime and potentially more realistic manipulations of the model inputs (since groups of clustered features are masked/unmasked together).

[8]:
shap.plots.waterfall(shap_values2[0])
../../../_images/example_notebooks_api_examples_explainers_Permutation_14_0.png

Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!