shap.utils.hclust

shap.utils.hclust(X: _ArrayLike, y: _ArrayLike | None = None, linkage: Literal['single', 'complete', 'average'] = 'single', metric: str = 'auto', random_state: int | np.random.RandomState = 0) np.ndarray

Fit a hierarchical clustering model for features X relative to target variable y.

For more information on clustering methods, see scipy.cluster.hierarchy.linkage().

For more information on scipy distance metrics, see scipy.spatial.distance.pdist().

Parameters:
X: 2d-array-like

Features to cluster

y: array-like or None

Target variable

linkage: str

Defines the method to calculate the distance between clusters. Must be one of “single”, “complete” or “average”.

metric: str

Scipy distance metric or “xgboost_distances_r2”.

  • If xgboost_distances_r2, estimate redundancy distances between features X with respect to target variable y using shap.utils.xgboost_distances_r2().

  • Otherwise, calculate distances between features using the given distance metric.

  • If auto (default), use xgboost_distances_r2 if target variable is provided, or else cosine distance metric.

random_state: int or np.random.RandomState

Numpy random state, defaults to 0.

Returns:
clustering: np.array

The hierarchical clustering encoded as a linkage matrix.