shap.maskers.Text

class shap.maskers.Text(tokenizer, mask_token='auto', collapse_mask_token=False, output_type='string')

This masks out tokens according to the given tokenizer.

The masked variables are

output_type : “string” (default) or “token_ids”

__init__(tokenizer, mask_token='auto', collapse_mask_token=False, output_type='string')

Initialize self. See help(type(self)) for accurate signature.

Methods

__init__(tokenizer[, mask_token, …])

Initialize self.

clustering(s)

Compute the clustering of tokens for the given string.

data_transform(s)

Called by explainers to allow us to convert data to better match masking (here this mean tokenizing).

feature_names(s)

The names of the features for each mask position for the given input string.

invariants(s)

The names of the features for each mask position for the given input string.

load(in_file[, instantiate])

Load a Text masker from a file stream.

mask_shapes(s)

The shape of the masks we expect.

save(out_file)

Save a Text masker to a file stream.

shape(s)

The shape of what we return as a masker.

token_segments(s)

Returns the substrings associated with each token in the given string.

tokenize(s)

Calls the underlying tokenizer on the given string.

clustering(s)

Compute the clustering of tokens for the given string.

data_transform(s)

Called by explainers to allow us to convert data to better match masking (here this mean tokenizing).

feature_names(s)

The names of the features for each mask position for the given input string.

invariants(s)

The names of the features for each mask position for the given input string.

classmethod load(in_file, instantiate=True)

Load a Text masker from a file stream.

mask_shapes(s)

The shape of the masks we expect.

save(out_file)

Save a Text masker to a file stream.

shape(s)

The shape of what we return as a masker.

Note we only return a single sample, so there is no expectation averaging.

token_segments(s)

Returns the substrings associated with each token in the given string.

tokenize(s)

Calls the underlying tokenizer on the given string.