Using custom functions and tokenizers

This notebook demonstrates how to use the Partition explainer for a multiclass text classification scenario where we are using a custom python function as our model.

[1]:
import datasets
import numpy as np
import pandas as pd
import scipy as sp
import torch
import transformers

import shap

# load the emotion dataset
dataset = datasets.load_dataset("emotion", split="train")
data = pd.DataFrame({"text": dataset["text"], "emotion": dataset["label"]})
Using custom data configuration default
Reusing dataset emotion (/home/slundberg/.cache/huggingface/datasets/emotion/default/0.0.0/aa34462255cd487d04be8387a2d572588f6ceee23f784f37365aa714afeb8fe6)

Define our model

While here we are using the transformers package, any python function that takes in a list of strings and outputs scores will work.

[2]:
# load the model and tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained("nateraw/bert-base-uncased-emotion", use_fast=True)
model = transformers.AutoModelForSequenceClassification.from_pretrained("nateraw/bert-base-uncased-emotion").cuda()
labels = sorted(model.config.label2id, key=model.config.label2id.get)


# this defines an explicit python function that takes a list of strings and outputs scores for each class
def f(x):
    tv = torch.tensor([tokenizer.encode(v, padding="max_length", max_length=128, truncation=True) for v in x]).cuda()
    attention_mask = (tv != 0).type(torch.int64).cuda()
    outputs = model(tv, attention_mask=attention_mask)[0].detach().cpu().numpy()
    scores = (np.exp(outputs).T / np.exp(outputs).sum(-1)).T
    val = sp.special.logit(scores)
    return val

Create an explainer

In order to build an Explainer we need both a model and a masker (the masker specifies how to hide portions of the input). Since we are using a custom function as our model, there is no way for SHAP to auto-infer a masker for us. So we need to provide one, either implicitly by passing a transformers tokenizer, or explicitly by building a shap.maskers.Text object

[3]:
method = "custom tokenizer"

# build an explainer by passing a transformers tokenizer
if method == "transformers tokenizer":
    explainer = shap.Explainer(f, tokenizer, output_names=labels)

# build an explainer by explicitly creating a masker
elif method == "default masker":
    masker = shap.maskers.Text(r"\W")  # this will create a basic whitespace tokenizer
    explainer = shap.Explainer(f, masker, output_names=labels)

# build a fully custom tokenizer
elif method == "custom tokenizer":
    import re

    def custom_tokenizer(s, return_offsets_mapping=True):
        """Custom tokenizers conform to a subset of the transformers API."""
        pos = 0
        offset_ranges = []
        input_ids = []
        for m in re.finditer(r"\W", s):
            start, end = m.span(0)
            offset_ranges.append((pos, start))
            input_ids.append(s[pos:start])
            pos = end
        if pos != len(s):
            offset_ranges.append((pos, len(s)))
            input_ids.append(s[pos:])
        out = {}
        out["input_ids"] = input_ids
        if return_offsets_mapping:
            out["offset_mapping"] = offset_ranges
        return out

    masker = shap.maskers.Text(custom_tokenizer)
    explainer = shap.Explainer(f, masker, output_names=labels)

Compute SHAP values

Explainers have the same method signature as the models they are explaining, so we just pass a list of strings for which to explain the classifications.

[4]:
shap_values = explainer(data["text"][:3])

Visualize the impact on all the output classes

In the plots below, when you hover your mouse over an output class you get the explanation for that output class. When you click an output class name then that class remains the focus of the explanation visualization until you click another class.

The base value is what the model outputs when the entire input text is masked, while \(f_{output class}(inputs)\) is the output of the model for the full original input. The SHAP values explain in an addive way how the impact of unmasking each word changes the model output from the base value (where the entire input is masked) to the final prediction value.

[5]:
shap.plots.text(shap_values)


[0]
outputs
sadness
joy
love
anger
fear
surprise


-1-4-7-10258-1.84234-1.84234base value5.625445.62544fsadness(inputs)6.54 humiliated 1.217 feel -0.197 i -0.092 didnt
inputs
-0.197
i
-0.092
didnt
1.217
feel
6.54
humiliated


[1]
outputs
sadness
joy
love
anger
fear
surprise


-1-4-7-10258-1.84234-1.84234base value5.353885.35388fsadness(inputs)8.14 hopeless 1.899 feeling 0.289 damned 0.287 to 0.145 from -1.984 hopeful -0.326 so -0.182 awake -0.157 cares -0.152 just -0.147 can -0.144 someone -0.118 i -0.081 so -0.066 around -0.06 go -0.056 being -0.038 is -0.033 from -0.012 who -0.007 and
inputs
-0.118
i
-0.147
can
-0.06
go
0.145
from
1.899
feeling
-0.326
so
8.14
hopeless
0.287
to
-0.081
so
0.289
damned
-1.984
hopeful
-0.152
just
-0.033
from
-0.056
being
-0.066
around
-0.144
someone
-0.012
who
-0.157
cares
-0.007
and
-0.038
is
-0.182
awake