shap.DeepExplainer

class shap.DeepExplainer(model, data, session=None, learning_phase_flags=None)

Meant to approximate SHAP values for deep learning models.

This is an enhanced version of the DeepLIFT algorithm (Deep SHAP) where, similar to Kernel SHAP, we approximate the conditional expectations of SHAP values using a selection of background samples. Lundberg and Lee, NIPS 2017 showed that the per node attribution rules in DeepLIFT (Shrikumar, Greenside, and Kundaje, arXiv 2017) can be chosen to approximate Shapley values. By integrating over many background samples, Deep estimates approximate SHAP values such that they sum up to the difference between the expected model output on the passed background samples and the current model output (f(x) - E[f(x)]).

Examples

See Deep Explainer Examples

__init__(model, data, session=None, learning_phase_flags=None)

An explainer object for a differentiable model using a given background dataset.

Note that the complexity of the method scales linearly with the number of background data samples. Passing the entire training dataset as data will give very accurate expected values, but will be unreasonably expensive. The variance of the expectation estimates scales by roughly 1/sqrt(N) for N background data samples. So 100 samples will give a good estimate, and 1000 samples a very good estimate of the expected values.

Parameters:

model

if framework == ‘tensorflow’, (input : [tf.Tensor], output : tf.Tensor) A pair of TensorFlow tensors (or a list and a tensor) that specifies the input and output of the model to be explained. Note that SHAP values are specific to a single output value, so the output tf.Tensor should be a single dimensional output (,1).

if framework == ‘pytorch’, an nn.Module object (model), or a tuple (model, layer), where both are nn.Module objects. The model is an nn.Module object which takes as input a tensor (or list of tensors) of shape data, and returns a single dimensional output. If the input is a tuple, the returned shap values will be for the input of the layer argument. layer must be a layer in the model, i.e. model.conv2

data

if framework == ‘tensorflow’: [np.array] or [pandas.DataFrame] if framework == ‘pytorch’: [torch.tensor]

The background dataset to use for integrating out features. Deep integrates over these samples. The data passed here must match the input tensors given in the first argument. Note that, since these samples are integrated over for each sample, you should only use something like 100 or 1000 random background samples, not the whole training dataset.

sessionNone or tensorflow.Session

The TensorFlow session that has the model we are explaining. If None is passed then we do our best to find the right session, first looking for a keras session, then falling back to the default TensorFlow session.

learning_phase_flagsNone or list of tensors

If you have your own custom learning phase flags, pass them here. When explaining a prediction we need to ensure we are not in training mode, since this changes the behavior of ops like batch norm or dropout. If None is passed then we look for tensors in the graph that look like learning phase flags (this works for Keras models). Note that we assume all the flags should have a value of False during predictions (and hence explanations).

Methods

`__init__`(model, data[, session, ...])	An explainer object for a differentiable model using a given background dataset.
`explain_row`(*row_args, max_evals, ...)	Explains a single row and returns the tuple (row_values, row_expected_values, row_mask_shapes, main_effects).
`load`(in_file[, model_loader, masker_loader, ...])	Load an Explainer from the given file stream.
`save`(out_file[, model_saver, masker_saver])	Write the explainer to the given file stream.
`shap_values`(X[, ranked_outputs, ...])	Return approximate SHAP values for the model applied to the data given by X.
`supports_model_with_masker`(model, masker)	Determines if this explainer can handle the given model.

Attributes

`model`
`masker`
`output_names`
`feature_names`
`link`
`linearize_link`

explain_row(*row_args: Any, max_evals: int | Literal['auto'], main_effects: bool, error_bounds: bool, outputs: Any, silent: bool, **kwargs: Any) → dict[str, Any]

Explains a single row and returns the tuple (row_values, row_expected_values, row_mask_shapes, main_effects).

This is an abstract method meant to be implemented by each subclass.

Returns:

tuple: A tuple of (row_values, row_expected_values, row_mask_shapes), where row_values is an array of the attribution values for each sample, row_expected_values is an array (or single value) representing the expected value of the model for each sample (which is the same for all samples unless there are fixed inputs present, like labels when explaining the loss), and row_mask_shapes is a list of all the input shapes (since the row_values is always flattened),

classmethod load(in_file: Any, model_loader: Callable[..., Any] | None = None, masker_loader: Callable[..., Any] | None = None, instantiate: bool = True) → Explainer | dict[str, Any]

Load an Explainer from the given file stream.

Parameters:

in_fileThe file stream to load objects from.

save(out_file: Any, model_saver: str | Callable[..., Any] = '.save', masker_saver: str | Callable[..., Any] = '.save') → None: Write the explainer to the given file stream.

shap_values(X, ranked_outputs=None, output_rank_order='max', check_additivity=True)

Return approximate SHAP values for the model applied to the data given by X.

Parameters:

Xlist,: if framework == ‘tensorflow’: np.array, or pandas.DataFrame if framework == ‘pytorch’: torch.tensor A tensor (or list of tensors) of samples (where X.shape[0] == # samples) on which to explain the model’s output.
ranked_outputsNone or int: If ranked_outputs is None then we explain all the outputs in a multi-output model. If ranked_outputs is a positive integer then we only explain that many of the top model outputs (where “top” is determined by output_rank_order). Note that this causes a pair of values to be returned (shap_values, indexes), where shap_values is a list of numpy arrays for each of the output ranks, and indexes is a matrix that indicates for each sample which output indexes were choses as “top”.
output_rank_order“max”, “min”, or “max_abs”: How to order the model outputs when using ranked_outputs, either by maximum, minimum, or maximum absolute value.

Returns:

np.array or list

Estimated SHAP values, usually of shape (# samples x # features).

The shape of the returned array depends on the number of model outputs:

one input, one output: matrix of shape (#num_samples, *X.shape[1:]).
one input, multiple outputs: matrix of shape (#num_samples, *X.shape[1:], #num_outputs)
multiple inputs, one or more outputs: list of matrices, with shapes of one of the above.

If ranked_outputs is None then this list of tensors matches the number of model outputs. If ranked_outputs is a positive integer a pair is returned (shap_values, indexes), where shap_values is a list of tensors with a length of ranked_outputs, and indexes is a matrix that indicates for each sample which output indexes were chosen as “top”.

Changed in version 0.45.0: Return type for models with multiple outputs and one input changed from list to np.ndarray.

static supports_model_with_masker(model: Any, masker: Any) → bool

Determines if this explainer can handle the given model.

This is an abstract static method meant to be implemented by each subclass.