Explaining the Loss of a Tree Model

Explaining the loss of a model can be very useful for debugging and model monitoring. This notebook gives a very simple example of how this works. Note that explaining the loss of a model requires passing the labels, and is only supported for the feature_perturbation="independent" option of TreeExplainer.

This notebook will be fleshed out once we post a full write-up of this method.

[1]:

import numpy as np
import xgboost

import shap

Train an XGBoost Classifier

[2]:

X, y = shap.datasets.adult()

model = xgboost.XGBClassifier()
model.fit(X, y)

# compute the logistic log-loss
model_loss = -np.log(model.predict_proba(X)[:, 1]) * y + -np.log(model.predict_proba(X)[:, 0]) * (1 - y)

model_loss[:10]

[2]:

array([8.43880873e-04, 2.47898608e-01, 1.17997164e-02, 7.11527169e-02,
       6.41849875e-01, 1.76084566e+00, 5.70287136e-03, 8.60033274e-01,
       4.78262809e-04, 6.43801317e-03])

Explain the Log-Loss of the Model with TreeExplainer

Note that the expected_value of the model’s loss depends on the label and so it is now a function instead of a single number.

[3]:

explainer = shap.TreeExplainer(model, X, feature_perturbation="interventional", model_output="log_loss")
explainer.shap_values(X.iloc[:10, :], y[:10]).sum(1) + np.array([explainer.expected_value(v) for v in y[:10]])

[3]:

array([8.43887488e-04, 2.47898585e-01, 1.17997435e-02, 7.11527711e-02,
       6.41849874e-01, 1.76084475e+00, 5.70285151e-03, 8.60033255e-01,
       4.78233521e-04, 6.43796897e-03])