shap.maskers.Partition

class shap.maskers.Partition(data, max_samples=100, clustering='correlation')

This masks out tabular features by integrating over the given background dataset.

Unlike Independent, Partition respects a hierarchical structure of the data.

__init__(data, max_samples=100, clustering='correlation')

Build a Partition masker with the given background data and clustering.

Parameters:
datanumpy.ndarray, pandas.DataFrame

The background dataset that is used for masking.

max_samplesint

The maximum number of samples to use from the passed background data. If data has more than max_samples then shap.utils.sample is used to subsample the dataset. The number of samples coming out of the masker (to be integrated over) matches the number of samples in the background dataset. This means larger background dataset cause longer runtimes. Normally about 1, 10, 100, or 1000 background samples are reasonable choices.

clusteringstring or numpy.ndarray

If a string, then this is the distance metric to use for creating the clustering of the features. The distance function can be any valid scipy.spatial.distance.pdist’s metric argument. However we suggest using ‘correlation’ in most cases. The full list of options is braycurtis, canberra, chebyshev, cityblock, correlation, cosine, dice, euclidean, hamming, jaccard, jensenshannon, kulsinski, mahalanobis, matching, minkowski, rogerstanimoto, russellrao, seuclidean, sokalmichener, sokalsneath, sqeuclidean, yule. These are all the options from scipy.spatial.distance.pdist’s metric argument. If an array, then this is assumed to be the clustering of the features.

Methods

__init__(data[, max_samples, clustering])

Build a Partition masker with the given background data and clustering.

invariants(x)

This returns a mask of which features change when we mask them.

load(in_file[, instantiate])

Load a Tabular masker from a file stream.

save(out_file)

Write a Tabular masker to a file stream.

invariants(x)

This returns a mask of which features change when we mask them.

This optional masking method allows explainers to avoid re-evaluating the model when the features that would have been masked are all invariant.

classmethod load(in_file, instantiate=True)

Load a Tabular masker from a file stream.

save(out_file)

Write a Tabular masker to a file stream.