shap.LinearExplainer
- class shap.LinearExplainer(model, masker, link=CPUDispatcher(<function identity>), nsamples=1000, feature_perturbation=None, **kwargs)
Computes SHAP values for a linear model, optionally accounting for inter-feature correlations.
This computes the SHAP values for a linear model and can account for the correlations among the input features. Assuming features are independent leads to interventional SHAP values which for a linear model are
coef[i] * (x[i] - X.mean(0)[i])
for the ith feature. If instead we account for correlations, then we prevent any problems arising from collinearity and share credit among correlated features. Accounting for correlations can be computationally challenging, butLinearExplainer
uses sampling to estimate a transform that can then be applied to explain any prediction of the model.- Parameters:
- model(coef, intercept) or sklearn.linear_model.*
User supplied linear model either as either a parameter pair or sklearn object.
- maskerfunction, numpy.array, pandas.DataFrame, tuple of (mean, cov), shap.maskers.Masker
A callable Python object used to “mask” out hidden features of the form
masker(binary_mask, x)
. It takes a single input sample and a binary mask and returns a matrix of masked samples. These masked samples are evaluated using the model function and the outputs are then averaged.As a shortcut for the standard masking using by SHAP you can pass a background data matrix instead of a function and that matrix will be used for masking.
You can also provide a tuple of
(mean, covariance)
, or pass in a masker meant for tabular data (i.e.,maskers.Independent
,maskers.Impute
, ormaskers.Partition
) directly.- data(mean, cov), numpy.array, pandas.DataFrame, iml.DenseData or scipy.csr_matrix
The background dataset to use for computing conditional expectations. Note that only the mean and covariance of the dataset are used. This means passing a raw data matrix is just a convenient alternative to passing the mean and covariance directly.
- nsamplesint
Number of samples to use when estimating the transformation matrix used to account for feature correlations.
- feature_perturbation“interventional” (default) or “correlation_dependent”
There are two ways we might want to compute SHAP values, either the full conditional SHAP values or the interventional SHAP values.
For interventional SHAP values we break any dependence structure between features in the model and so uncover how the model would behave if we intervened and changed some of the inputs. For the full conditional SHAP values we respect the correlations among the input features, so if the model depends on one input but that input is correlated with another input, then both get some credit for the model’s behavior. The interventional option stays “true to the model” meaning it will only give credit to features that are actually used by the model, while the correlation option stays “true to the data” in the sense that it only considers how the model would behave when respecting the correlations in the input data. For sparse case only interventional option is supported.
Note that the
feature_perturbation
option is deprecated and will be removed in a future release. It is recommended to use the appropriate tabularmasker
instead.
Examples
- __init__(model, masker, link=CPUDispatcher(<function identity>), nsamples=1000, feature_perturbation=None, **kwargs)
Build a new explainer for the passed model.
- Parameters:
- modelobject or function
User supplied function or model object that takes a dataset of samples and computes the output of the model for those samples.
- maskerfunction, numpy.array, pandas.DataFrame, tokenizer, None, or a list of these for each model input
The function used to “mask” out hidden features of the form masked_args = masker(*model_args, mask=mask). It takes input in the same form as the model, but for just a single sample with a binary mask, then returns an iterable of masked samples. These masked samples will then be evaluated using the model function and the outputs averaged. As a shortcut for the standard masking using by SHAP you can pass a background data matrix instead of a function and that matrix will be used for masking. Domain specific masking functions are available in shap such as shap.ImageMasker for images and shap.TokenMasker for text. In addition to determining how to replace hidden features, the masker can also constrain the rules of the cooperative game used to explain the model. For example shap.TabularMasker(data, hclustering=”correlation”) will enforce a hierarchical clustering of coalitions for the game (in this special case the attributions are known as the Owen values).
- linkfunction
The link function used to map between the output units of the model and the SHAP value units. By default it is shap.links.identity, but shap.links.logit can be useful so that expectations are computed in probability units while explanations remain in the (more naturally additive) log-odds units. For more details on how link functions work see any overview of link functions for generalized linear models.
- algorithm“auto”, “permutation”, “partition”, “tree”, or “linear”
The algorithm used to estimate the Shapley values. There are many different algorithms that can be used to estimate the Shapley values (and the related value for constrained games), each of these algorithms have various tradeoffs and are preferable in different situations. By default the “auto” options attempts to make the best choice given the passed model and masker, but this choice can always be overridden by passing the name of a specific algorithm. The type of algorithm used will determine what type of subclass object is returned by this constructor, and you can also build those subclasses directly if you prefer or need more fine grained control over their options.
- output_namesNone or list of strings
The names of the model outputs. For example if the model is an image classifier, then output_names would be the names of all the output classes. This parameter is optional. When output_names is None then the Explanation objects produced by this explainer will not have any output_names, which could effect downstream plots.
- seed: None or int
seed for reproducibility
Methods
__init__
(model, masker[, link, nsamples, ...])Build a new explainer for the passed model.
explain_row
(*row_args, max_evals, ...)Explains a single row and returns the tuple (row_values, row_expected_values, row_mask_shapes).
load
(in_file[, model_loader, masker_loader, ...])Load an Explainer from the given file stream.
save
(out_file[, model_saver, masker_saver])Write the explainer to the given file stream.
shap_values
(X)Estimate the SHAP values for a set of samples.
supports_model_with_masker
(model, masker)Determines if we can parse the given model.
- explain_row(*row_args, max_evals, main_effects, error_bounds, batch_size, outputs, silent)
Explains a single row and returns the tuple (row_values, row_expected_values, row_mask_shapes).
- classmethod load(in_file, model_loader=<bound method Model.load of <class 'shap.models._model.Model'>>, masker_loader=<bound method Serializable.load of <class 'shap.maskers._masker.Masker'>>, instantiate=True)
Load an Explainer from the given file stream.
- Parameters:
- in_fileThe file stream to load objects from.
- save(out_file, model_saver='.save', masker_saver='.save')
Write the explainer to the given file stream.
- shap_values(X)
Estimate the SHAP values for a set of samples.
- Parameters:
- Xnumpy.array, pandas.DataFrame or scipy.csr_matrix
A matrix of samples (# samples x # features) on which to explain the model’s output.
- Returns:
- array or list
For models with a single output this returns a matrix of SHAP values (# samples x # features). Each row sums to the difference between the model output for that sample and the expected value of the model output (which is stored as expected_value attribute of the explainer).
- static supports_model_with_masker(model, masker)
Determines if we can parse the given model.