Emotion classification multiclass example

This notebook demonstrates how to use the Partition explainer for a multiclass text classification scenario. Once the SHAP values are computed for a set of sentences we then visualize feature attributions towards individual classes. The text classifcation model we use is BERT fine-tuned on an emotion dataset to classify a sentence among six classes: joy, sadness, anger, fear, love and surprise.

[1]:

import datasets
import pandas as pd
import transformers

import shap

# load the emotion dataset
dataset = datasets.load_dataset("emotion", split="train")
data = pd.DataFrame({"text": dataset["text"], "emotion": dataset["label"]})

Using custom data configuration default
Reusing dataset emotion (/home/slundberg/.cache/huggingface/datasets/emotion/default/0.0.0/aa34462255cd487d04be8387a2d572588f6ceee23f784f37365aa714afeb8fe6)

Build a transformers pipline

Note that we have set return_all_scores=True for the pipeline so we can observe the model’s behavior for all classes, not just the top output.

[2]:

# load the model and tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained("nateraw/bert-base-uncased-emotion", use_fast=True)
model = transformers.AutoModelForSequenceClassification.from_pretrained("nateraw/bert-base-uncased-emotion").cuda()

# build a pipeline object to do predictions
pred = transformers.pipeline(
    "text-classification",
    model=model,
    tokenizer=tokenizer,
    device=0,
    return_all_scores=True,
)

Create an explainer for the pipeline

A transformers pipeline object can be passed directly to shap.Explainer, which will then wrap the pipeline model as a shap.models.TransformersPipeline model and the pipeline tokenizer as a shap.maskers.Text masker.

[3]:

explainer = shap.Explainer(pred)

Compute SHAP values

Explainers have the same method signature as the models they are explaining, so we just pass a list of strings for which to explain the classifications.

[4]:

shap_values = explainer(data["text"][:3])

Visualize the impact on all the output classes

In the plots below, when you hover your mouse over an output class you get the explanation for that output class. When you click an output class name then that class remains the focus of the explanation visualization until you click another class.

The base value is what the model outputs when the entire input text is masked, while \(f_{output class}(inputs)\) is the output of the model for the full original input. The SHAP values explain in an addive way how the impact of unmasking each word changes the model output from the base value (where the entire input is masked) to the final prediction value.

[5]:

shap.plots.text(shap_values)

[0]

outputs

sadness

joy

love

anger

fear

surprise

inputs

i

didn

t

feel

humiliated

[1]

outputs

sadness

joy

love

anger

fear

surprise

inputs

i

can

go

from

feeling

so

hopeless

to

so

damned

hopeful

just

from

being

around

someone

who

cares

and

is

awake

[2]

outputs

sadness

joy

love

anger

fear

surprise

inputs

im

grabbing

a

minute

to

post

i

feel

greedy

wrong

Visualize the impact on a single class

Since Explanation objects are sliceable we can slice out just a single output class to visualize the model output towards that class.

[11]:

shap.plots.text(shap_values[:, :, "anger"])

[0]

inputs

i

didn

t

feel

humiliated

[1]

inputs

i

can go

from

feeling

so

hopeless

to

so

damned

hopeful

just

from

being

around

someone

who

cares

and

is

awake

[2]

inputs

im

grabbing

a

minute

to

post

i

feel

greedy

wrong

Plotting the top words impacting a specific class

In addition to slicing, Explanation objects also support a set of reducing methods. Here we use the .mean(0) to take the average impact of all words towards the “joy” class. Note that here we are also averaging over three examples, to get a better summary you would want to use a larger portion of the dataset.

[12]:

shap.plots.bar(shap_values[:, :, "joy"].mean(0))

../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_classification_multiclass_example_14_0.png

[13]:

# we can sort the bar chart in decending order
shap.plots.bar(shap_values[:, :, "joy"].mean(0), order=shap.Explanation.argsort)

../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_classification_multiclass_example_15_0.png

[14]:

# ...or acending order
shap.plots.bar(shap_values[:, :, "joy"].mean(0), order=shap.Explanation.argsort.flip)

../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_classification_multiclass_example_16_0.png

Explain the log odds instead of the probabilities

In the examples above we explained the direct output of the pipline object, which are class probabilities. Sometimes it makes more sense to work in a log odds space where it is natural to add and subtract effects (addition and subtraction correspond to the addition or subtraction of bits of evidence information). To work with logits we can use a parameter of the shap.models.TransformersPipeline object:

[15]:

logit_explainer = shap.Explainer(shap.models.TransformersPipeline(pred, rescale_to_logits=True))

logit_shap_values = logit_explainer(data["text"][:3])
shap.plots.text(logit_shap_values)

[0]

outputs

sadness

joy

love

anger

fear

surprise

inputs

i

didn

t

feel

humiliated

[1]

outputs

sadness

joy

love

anger

fear

surprise

inputs

i

can go

from

feeling

so

hopeless

to so

damned

hopeful

just from

being around

someone who

cares

and

is

awake

[2]

outputs

sadness

joy

love

anger

fear

surprise

inputs

im

grabbing

a

minute

to

post

i

feel

greedy

wrong

Have an idea for more helpful examples? Pull requests that add to this documentation notebook are encouraged!