Positive vs. Negative Sentiment Classification

Here we demonstrate how to explain a sentiment classification model for movie reviews. positive vs. negative sentim

[47]:
import transformers
import datasets
import shap
import numpy as np

Load the IMDB movie review dataset

[2]:
dataset = datasets.load_dataset("imdb", split="test")

# shorten the strings to fit into the pipeline model
short_data = [v[:500] for v in dataset["text"][:20]]
Reusing dataset imdb (/home/slundberg/.cache/huggingface/datasets/imdb/plain_text/1.0.0/90099cb476936b753383ba2ae6ab2eae419b2e87f71cd5189cb9c8e5814d12a3)

Load and run a sentiment analysis pipeline

[3]:
classifier = transformers.pipeline('sentiment-analysis', return_all_scores=True)
classifier(short_data[:2])
[3]:
[[{'label': 'NEGATIVE', 'score': 0.0012035118415951729},
  {'label': 'POSITIVE', 'score': 0.9987965226173401}],
 [{'label': 'NEGATIVE', 'score': 0.002218781039118767},
  {'label': 'POSITIVE', 'score': 0.9977812170982361}]]

Explain the sentiment analysis pipeline

[4]:
# define the explainer
explainer = shap.Explainer(classifier)
[5]:
# explain the predictions of the pipeline on the first two samples
shap_values = explainer(short_data[:2])
[6]:
shap.plots.text(shap_values[:,:,"POSITIVE"])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

Wrap the pipeline manually

SHAP requires tensor outputs from the classifier, and explanations works best in additive spaces so we transform the probabilities into logit values (information values instead of probabilites).

Create a TransformersPipeline wrapper

[7]:
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=False)
[8]:
pmodel(short_data[:2])
[8]:
array([[0.00120351, 0.99879652],
       [0.00221878, 0.99778122]])
[9]:
pmodel = shap.models.TransformersPipeline(classifier, rescale_to_logits=True)
pmodel(short_data[:2])
[9]:
array([[-6.72130722,  6.72133589],
       [-6.10857607,  6.10857523]])
[13]:
explainer2 = shap.Explainer(pmodel)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:,:,1])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

Pass a tokenizer as the masker object

[15]:
explainer2 = shap.Explainer(pmodel, classifier.tokenizer)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:,:,1])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

Build a Text masker explicitly

[35]:
masker = shap.maskers.Text(classifier.tokenizer)
explainer2 = shap.Explainer(pmodel, masker)
shap_values2 = explainer2(short_data[:2])
shap.plots.text(shap_values2[:,:,1])

0th instance:
-0.187151base value-2.0362201.6619183.5109875.3600567.2091256.7213366.721336f(x)4.179 the sign of a good movie is that it can toy with our emotions . this one did exactly that . 3.398 the entire theater ( which was sold out ) was overcome by laughter during the 1.27 i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism . 0.0 -1.939 i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy . -0.0
0.0
-1.939 / 49
i went and saw this movie last night after being coaxed to by a few friends of mine . i ' ll admit that i was reluctant to see it because from what i knew of ashton kutcher he was only able to do comedy .
1.27 / 28
i was wrong . kutcher played the character of jake fischer very well , and kevin costner played ben randall with such professionalism .
4.179 / 21
the sign of a good movie is that it can toy with our emotions . this one did exactly that .
3.398 / 15
the entire theater ( which was sold out ) was overcome by laughter during the
-0.0

1st instance:
-0.204055base value-2.0531241.6450143.4940835.3431527.1922226.1085756.108575f(x)1.885 actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s . 1.733 but some how this film was enthralli 1.612 idol in what was dubbed as " the greatest game ever played . 1.359 open where a young american caddy rises from his humble background to play against his bristish 0.0 -0.275 " i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) , -0.0
-0.0
1.885 / 33
actor turned director bill paxton follows up his promising debut , the gothic - horror " frailty " , with this family friendly sports drama about the 1913 u . s .
1.359 / 19
open where a young american caddy rises from his humble background to play against his bristish
1.612 / 13
idol in what was dubbed as " the greatest game ever played .
-0.275 / 41
" i ' m no fan of golf , and these scrappy underdog sports flicks are a dime a dozen ( most recently done to grand effect with " miracle " and " cinderella man " ) ,
1.733 / 10
but some how this film was enthralli
0.0

Explore how the Text masker works

[42]:
masker.shape("I like this movie.")
[42]:
(1, 7)
[48]:
model_args = masker(np.array([True, True, True, True, True, True, True]), "I like this movie.")
model_args
[48]:
(array(['i like this movie .'], dtype='<U19'),)
[49]:
pmodel(*model_args)
[49]:
array([[-8.90780458,  8.90742142]])
[50]:
model_args = masker(np.array([True, True, False, False, True, True, True]), "I like this movie.")
model_args
[50]:
(array(['i [MASK] [MASK] movie .'], dtype='<U23'),)
[51]:
pmodel(*model_args)
[51]:
array([[-3.72092204,  3.72092316]])
[52]:
masker2 = shap.maskers.Text(classifier.tokenizer, mask_token="...", collapse_mask_token=True)
[53]:
model_args2 = masker2(np.array([True, True, False, False, True, True, True]), "I like this movie.")
model_args2
[53]:
(array(['i . . . movie .'], dtype='<U15'),)
[54]:
pmodel(*model_args2)
[54]:
array([[-3.20818664,  3.20818753]])

Plot summary statistics and bar charts

[55]:
# explain the predictions of the pipeline on the first two samples
shap_values = explainer(short_data[:20])
Partition explainer: 21it [00:11,  1.76it/s]
[56]:
shap.plots.bar(shap_values[0,:,"POSITIVE"])
../../../_images/example_notebooks_text_examples_sentiment_analysis_Positive_vs._Negative_Sentiment_Classification_31_0.png
[57]:
shap.plots.bar(shap_values[:, :, "POSITIVE"].mean(0))
../../../_images/example_notebooks_text_examples_sentiment_analysis_Positive_vs._Negative_Sentiment_Classification_32_0.png
[59]:
shap.plots.bar(shap_values[:, :, "POSITIVE"].mean(0), order=shap.Explanation.argsort)
../../../_images/example_notebooks_text_examples_sentiment_analysis_Positive_vs._Negative_Sentiment_Classification_33_0.png
[ ]: