This notebook demonstrates how to use the Partition explainer for a multiclass text classification scenario. Once the SHAP values are computed for a set of sentences we then visualize feature attributions towards individual classes. The text classifcation model we use is BERT fine-tuned on an emotion dataset to classify a sentence among six classes: joy, sadness, anger, fear, love and surprise.
[1]:
importpandasaspdimportnumpyasnpimportscipyasspimportmatplotlib.pyplotaspltimporttorchimporttransformersimportdatasetsimportshap# load the emotion datasetdataset=datasets.load_dataset("emotion",split="train")data=pd.DataFrame({'text':dataset['text'],'emotion':dataset['label']})
Using custom data configuration default
Reusing dataset emotion (/home/slundberg/.cache/huggingface/datasets/emotion/default/0.0.0/aa34462255cd487d04be8387a2d572588f6ceee23f784f37365aa714afeb8fe6)
Note that we have set return_all_scores=True for the pipeline so we can observe the model’s behavior for all classes, not just the top output.
[2]:
# load the model and tokenizertokenizer=transformers.AutoTokenizer.from_pretrained("nateraw/bert-base-uncased-emotion",use_fast=True)model=transformers.AutoModelForSequenceClassification.from_pretrained("nateraw/bert-base-uncased-emotion").cuda()# build a pipeline object to do predictionspred=transformers.pipeline("text-classification",model=model,tokenizer=tokenizer,device=0,return_all_scores=True)
A transformers pipeline object can be passed directly to shap.Explainer, which will then wrap the pipeline model as a shap.models.TransformersPipeline model and the pipeline tokenizer as a shap.maskers.Text masker.
Explainers have the same method signature as the models they are explaining, so we just pass a list of strings for which to explain the classifications.
In the plots below, when you hover your mouse over an output class you get the explanation for that output class. When you click an output class name then that class remains the focus of the explanation visualization until you click another class.
The base value is what the model outputs when the entire input text is masked, while \(f_{output class}(inputs)\) is the output of the model for the full original input. The SHAP values explain in an addive way how the impact of unmasking each word changes the model output from the base value (where the entire input is masked) to the final prediction value.