Explaining the Loss of a Tree Model

Explaining the loss of a model can be very useful for debugging and model monitoring. This notebook gives a very simple example of how this works. Note that explaining the loss of a model requires passing the labels, and is only supported for the feature_dependence="independent" option of TreeExplainer.

This notebook will be fleshed out once we post a full write-up of this method.

[1]:
import numpy as np
import xgboost

import shap

Train an XGBoost Classifier

[2]:
X, y = shap.datasets.adult()

model = xgboost.XGBClassifier()
model.fit(X, y)

# compute the logistic log-loss
model_loss = -np.log(model.predict_proba(X)[:, 1]) * y + -np.log(
    model.predict_proba(X)[:, 0]
) * (1 - y)

model_loss[:10]
[2]:
array([0.08443378, 0.45300266, 0.03874125, 0.11340553, 0.67350864,
       1.41265261, 0.00916297, 0.91732287, 0.01906859, 0.07444511])

Explain the Log-Loss of the Model with TreeExplainer

Note that the expected_value of the model’s loss depends on the label and so it is now a function instead of a single number.

[3]:
explainer = shap.TreeExplainer(
    model, X, feature_dependence="independent", model_output="logloss"
)
explainer.shap_values(X.iloc[:10, :], y[:10]).sum(1) + np.array(
    [explainer.expected_value(v) for v in y[:10]]
)
[3]:
array([0.08443378, 0.45300268, 0.03874123, 0.11340551, 0.67350869,
       1.41265219, 0.00916299, 0.9173229 , 0.01906863, 0.07444508])