SHAP Values for Multi-Output Regression Models

Author: coryroyce

Date updated: 3/4/2021

Create Multi-Output Regression Model

Create Data

Import required packages

import pandas as pd
from sklearn.datasets import make_regression
from keras.models import Sequential
from keras.layers import Dense

Create a custom function that generates the multi-output regression data. Note: Creating 5 outputs/targets/labels for this example, but the method easily extends to any number or outputs.

def get_dataset():
  # Create sample data with sklearn make_regression function
  X, y = make_regression(n_samples=1000, n_features=10, n_informative=7, n_targets=5, random_state=0)

  # Convert the data into Pandas Dataframes for easier maniplution and keeping stored column names
  # Create feature column names
  feature_cols = ['feature_01', 'feature_02', 'feature_03', 'feature_04',
                  'feature_05', 'feature_06', 'feature_07', 'feature_08',
                  'feature_09', 'feature_10']

  df_features = pd.DataFrame(data = X, columns = feature_cols)

  # Create lable column names and dataframe
  label_cols = ['labels_01', 'labels_02', 'labels_03', 'labels_04', 'labels_05']

  df_labels = pd.DataFrame(data = y, columns = label_cols)

  return df_features, df_labels

Create Model

Create a Tensorflow/Keras Sequential model.

def get_model(n_inputs, n_outputs):
    model = Sequential()
    model.add(Dense(32, input_dim=n_inputs, kernel_initializer='he_uniform', activation='relu'))
    model.add(Dense(n_outputs, kernel_initializer='he_uniform'))
    model.compile(loss='mae', optimizer='adam')
    return model

Train Model

Create the dataset with the function defined above.

# Create the datasets
X, y = get_dataset()

# Get the number of inputs and outputs from the dataset
n_inputs, n_outputs = X.shape[1], y.shape[1]

Load the model with function defined above.

model = get_model(n_inputs, n_outputs)

Train the model

[6]:, y, verbose=0, epochs=100)
<tensorflow.python.keras.callbacks.History at 0x7f08e9a7e490>

Get model evaluation metrics to confirm training went well.

model.evaluate(x = X, y = y)
32/32 [==============================] - 0s 826us/step - loss: 15.8952

Model Prediction

Manual data can be entered but in this case, just us an arbitrary index form the feature/X data.

array([[ -15.026388,  -64.4412  ,  -75.39472 ,  -70.4628  , -126.55638 ]],

Get SHAP Values and Plots

Apply Shapley vaules to the model.

!pip install shap
import shap

# print the JS visualization code to the notebook
Collecting shap
  Downloading (356kB)
     |████████████████████████████████| 358kB 6.0MB/s
Requirement already satisfied: numpy in /usr/local/lib/python3.7/dist-packages (from shap) (1.19.5)
Requirement already satisfied: scipy in /usr/local/lib/python3.7/dist-packages (from shap) (1.4.1)
Requirement already satisfied: scikit-learn in /usr/local/lib/python3.7/dist-packages (from shap) (0.22.2.post1)
Requirement already satisfied: pandas in /usr/local/lib/python3.7/dist-packages (from shap) (1.1.5)
Requirement already satisfied: tqdm>4.25.0 in /usr/local/lib/python3.7/dist-packages (from shap) (4.41.1)
Collecting slicer==0.0.7
Requirement already satisfied: numba in /usr/local/lib/python3.7/dist-packages (from shap) (0.51.2)
Requirement already satisfied: cloudpickle in /usr/local/lib/python3.7/dist-packages (from shap) (1.3.0)
Requirement already satisfied: joblib>=0.11 in /usr/local/lib/python3.7/dist-packages (from scikit-learn->shap) (1.0.1)
Requirement already satisfied: python-dateutil>=2.7.3 in /usr/local/lib/python3.7/dist-packages (from pandas->shap) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /usr/local/lib/python3.7/dist-packages (from pandas->shap) (2018.9)
Requirement already satisfied: setuptools in /usr/local/lib/python3.7/dist-packages (from numba->shap) (54.0.0)
Requirement already satisfied: llvmlite<0.35,>=0.34.0.dev0 in /usr/local/lib/python3.7/dist-packages (from numba->shap) (0.34.0)
Requirement already satisfied: six>=1.5 in /usr/local/lib/python3.7/dist-packages (from python-dateutil>=2.7.3->pandas->shap) (1.15.0)
Building wheels for collected packages: shap
  Building wheel for shap ( ... done
  Created wheel for shap: filename=shap-0.39.0-cp37-cp37m-linux_x86_64.whl size=491624 sha256=d4d0a19e515d857230caed0cc9bd7ad48017557ad8d72898297455efe78376ea
  Stored in directory: /root/.cache/pip/wheels/15/27/f5/a8ab9da52fd159aae6477b5ede6eaaec69fd130fa0fa59f283
Successfully built shap
Installing collected packages: slicer, shap
Successfully installed shap-0.39.0 slicer-0.0.7

Here we take the Keras model trained above and explain why it makes different predictions on individual samples.

Set the explainer using the Kernel Explainer (Model agnostic explainer method form SHAP).

explainer = shap.KernelExplainer(model = model.predict, data = X.head(50), link = "identity")

Get the Shapley value for a single example.

# Set the index of the specific example to explain
X_idx = 0

shap_value_single = explainer.shap_values(X = X.iloc[X_idx:X_idx+1,:], nsamples = 100)

Display the details of the single example

feature_01 feature_02 feature_03 feature_04 feature_05 feature_06 feature_07 feature_08 feature_09 feature_10
0 -0.093555 0.417854 -1.655827 -2.048833 -0.258209 -0.989744 -0.154596 -0.338294 1.503827 -0.514878

Choose the label/output/target to run individual explanations on:

Note: The dropdown menu can easily be replaced by manually setting the index on the label to explain.

import ipywidgets as widgets
# Create the list of all labels for the drop down list
list_of_labels = y.columns.to_list()

# Create a list of tuples so that the index of the label is what is returned
tuple_of_labels = list(zip(list_of_labels, range(len(list_of_labels))))

# Create a widget for the labels and then display the widget
current_label = widgets.Dropdown(options=tuple_of_labels,
                              description='Select Label:'

# Display the dropdown list (Note: access index value with 'current_label.value')

Plot the force plot for a single example and a single label/output/target

# print the JS visualization code to the notebook

print(f'Current label Shown: {list_of_labels[current_label.value]}')

shap.force_plot(base_value = explainer.expected_value[current_label.value],
                shap_values = shap_value_single[current_label.value],
                features = X.iloc[X_idx:X_idx+1,:]
Current label Shown: labels_01
Visualization omitted, Javascript library not loaded!
Have you run `initjs()` in this notebook? If this notebook was from another user you must also trust this notebook (File -> Trust notebook). If you are viewing this notebook on github the Javascript has been stripped for security. If you are using JupyterLab this error is because a JupyterLab extension has not yet been written.

Create the summary plot for a specific output/label/target.

# Note: We are limiting to the first 50 training examples since it takes time to calculate the full number of sampels
shap_values = explainer.shap_values(X = X.iloc[0:50,:], nsamples = 100)

# print the JS visualization code to the notebook

print(f'Current Label Shown: {list_of_labels[current_label.value]}\n')

shap.summary_plot(shap_values = shap_values[current_label.value],
                  features = X.iloc[0:50,:]