Text to Multiclass Explanation: Emotion Classification Example

This notebook demonstrates how to use the partition explainer for multiclass scenario with text data and visualize feature attributions towards individual classes. For computing shap values for a multiclass scenario, it uses the partition explainer over the text data and computes attribution for a feature towards a given class based on its marginal contribution towards the difference in the one vs all logit for the respective class from its base value.

Below we walkthrough an example demonstrating explanation of a text classfication model (https://huggingface.co/nateraw/bert-base-uncased-emotion) which in this case is BERT fine-tuned on the emotion dataset (https://huggingface.co/datasets/emotion) provided by hugging face to classify a sentence among six classes of emotions: joy, sadness, anger, fear, love and surprise.

[1]:
import copy
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import shap
import scipy as sp
from datasets import load_dataset
import torch

Load data

[2]:
dataset  = load_dataset("emotion", split = "train")
data = pd.DataFrame({'text':dataset['text'],'emotion':dataset['label']})
Using custom data configuration default
Reusing dataset emotion (/home/slundberg/.cache/huggingface/datasets/emotion/default/0.0.0/aa34462255cd487d04be8387a2d572588f6ceee23f784f37365aa714afeb8fe6)
[3]:
data.head()
[3]:
text emotion
0 i didnt feel humiliated 0
1 i can go from feeling so hopeless to so damned... 0
2 im grabbing a minute to post i feel greedy wrong 3
3 i am ever feeling nostalgic about the fireplac... 2
4 i am feeling grouchy 3

Load model and tokenizer

[5]:
tokenizer = AutoTokenizer.from_pretrained("nateraw/bert-base-uncased-emotion",use_fast=True)
model = AutoModelForSequenceClassification.from_pretrained("nateraw/bert-base-uncased-emotion").cuda()
[6]:
# set mapping between label and id
id2label = model.config.id2label
label2id = model.config.label2id
labels = sorted(label2id, key=label2id.get)

Distribution of emotion labels

[7]:
ax = data.emotion.map(id2label).value_counts().plot.bar()
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_10_0.png

Define function

[8]:
def f(x):
    tv = torch.tensor([tokenizer.encode(v, padding='max_length', max_length=128,truncation=True) for v in x]).cuda()
    attention_mask = (tv!=0).type(torch.int64).cuda()
    outputs = model(tv,attention_mask=attention_mask)[0].detach().cpu().numpy()
    scores = (np.exp(outputs).T / np.exp(outputs).sum(-1)).T
    val = sp.special.logit(scores)
    return val

Create an explainer object

[9]:
explainer = shap.Explainer(f,tokenizer,output_names=labels)

Compute SHAP values

[10]:
shap_values = explainer(data['text'][0:50])

Top words contributing to emotion class: joy

[11]:
shap.plots.bar(shap_values[:,:,"joy"].mean(0))
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_18_0.png

Top words positively contributing to emotion class: joy

[12]:
shap.plots.bar(shap_values[:,:,"joy"].mean(0), order=shap.Explanation.argsort.flip)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_20_0.png

Top words negatively contributing to emotion class: joy

[13]:
shap.plots.bar(shap_values[:,:,"joy"].mean(0), order=shap.Explanation.argsort)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_22_0.png

Top words positively contributing to emotion class: surprise

[14]:
shap.plots.bar(shap_values[:,:,"surprise"].mean(0), order=shap.Explanation.argsort.flip)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_24_0.png

Top words negatively contributing to emotion class: surprise

[15]:
shap.plots.bar(shap_values[:,:,"surprise"].mean(0), order=shap.Explanation.argsort)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_26_0.png

Top words positively contributing to emotion class: anger

[16]:
shap.plots.bar(shap_values[:,:,"anger"].mean(0), order=shap.Explanation.argsort.flip)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_28_0.png

Top words negatively contributing to emotion class: anger

[17]:
shap.plots.bar(shap_values[:,:,"anger"].mean(0), order=shap.Explanation.argsort)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Emotion_Classification_Explanation_Demo_30_0.png

Visualizing text plots over attribution of features towards a given class

[22]:
shap.plots.text(shap_values[:5])

0th instance:
Visualization Type:
Input/Output - Heatmap
Layout :
Input Text
i
didn
t
feel
humiliated
Output Text
sadness
joy
love
anger
fear
surprise

1st instance:
Visualization Type:
Input/Output - Heatmap
Layout :
Input Text
i
can
go
from
feeling
so
hopeless
to
so
damned
hopeful
just
from
being
around
someone
who
cares
and
is
awake
Output Text
sadness
joy
love
anger
fear
surprise

2nd instance:
Visualization Type:
Input/Output - Heatmap
Layout :
Input Text
im
grabbing
a
minute
to
post
i
feel
greedy
wrong
Output Text
sadness
joy
love
anger
fear
surprise

3rd instance:
Visualization Type:
Input/Output - Heatmap
Layout :
Input Text
i
am
ever
feeling
nos
tal
gic
about
the
fireplace
i
will
know
that
it
is
still
on
the
property
Output Text
sadness
joy
love
anger
fear
surprise

4th instance:
Visualization Type:
Input/Output - Heatmap
Layout :
Input Text
i
am
feeling
gr
ou
chy
Output Text
sadness
joy
love
anger
fear
surprise
[ ]: