shap.maskers.Text

class shap.maskers.Text(tokenizer=None, mask_token=None, collapse_mask_token='auto', output_type='string')

This masks out tokens according to the given tokenizer.

The masked variables are

output_type : “string” (default) or “token_ids”

__init__(tokenizer=None, mask_token=None, collapse_mask_token='auto', output_type='string')

Build a new Text masker given an optional passed tokenizer.

Parameters:

tokenizercallable or None: The tokenizer used to break apart strings during masking. The passed tokenizer must support a minimal subset of the HuggingFace Transformers PreTrainedTokenizerBase API. This minimal subset means the tokenizer must return a dictionary with ‘input_ids’ and then either include an ‘offset_mapping’ entry in the same dictionary or provide a .convert_ids_to_tokens or .decode method.
mask_tokenstring, int, or None: The sub-string or integer token id used to mask out portions of a string. If None it will use the tokenizer’s .mask_token attribute, if defined, or “…” if the tokenizer does not have a .mask_token attribute.
collapse_mask_tokenTrue, False, or “auto”: If True, when several consecutive tokens are masked only one mask token is used to replace the entire series of original tokens.

Methods

`__init__`([tokenizer, mask_token, ...])	Build a new Text masker given an optional passed tokenizer.
`clustering`(s)	Compute the clustering of tokens for the given string.
`data_transform`(s)	Called by explainers to allow us to convert data to better match masking (here this means tokenizing).
`feature_names`(s)	The names of the features for each mask position for the given input string.
`invariants`(s)	The names of the features for each mask position for the given input string.
`load`(in_file[, instantiate])	Load a Text masker from a file stream.
`mask_shapes`(s)	The shape of the masks we expect.
`save`(out_file)	Save a Text masker to a file stream.
`shape`(s)	The shape of what we return as a masker.
`token_segments`(s)	Returns the substrings associated with each token in the given string.

data_transform(s): Called by explainers to allow us to convert data to better match masking (here this means tokenizing).

feature_names(s): The names of the features for each mask position for the given input string.

invariants(s): The names of the features for each mask position for the given input string.

classmethod load(in_file, instantiate=True): Load a Text masker from a file stream.

shape(s)

The shape of what we return as a masker.

Note we only return a single sample, so there is no expectation averaging.

token_segments(s): Returns the substrings associated with each token in the given string.