shap.utils.hclust

shap.utils.hclust(X, y=None, linkage='single', metric='auto', random_state=0)

Fit a hierarcical clustering model for features X relative to target variable y.

For more information on clutering methods see: https://docs.scipy.org/doc/scipy/reference/generated/scipy.cluster.hierarchy.linkage.html

Parameters:
X: np.array

Features to cluster

y: np.array | None

Target variable

linkage: str

Defines the method to calculate the distance between clusters. Must be one of “single”, “complete” or “average”.

metric: str

Scipy distance metric or “xgboost_distances_r2”.

  • If “xgboost_distances_r2”, estimate redundancy distances between features X with respect to target variable y using shap.utils.xgboost_distances_r2().

  • Otherwise, calculate distances between features using the given distance metric.

  • If auto (default), use xgboost_distances_r2 if target variable is provided, or else cosine distance metric.

random_state: int

Numpy random state

Returns:
clustering: np.array

The hierarchical clustering encoded as a linkage matrix.