shap.utils.hclust
- shap.utils.hclust(X: _ArrayLike, y: _ArrayLike | None = None, linkage: Literal['single', 'complete', 'average'] = 'single', metric: str = 'auto', random_state: int | np.random.RandomState = 0) npt.NDArray[Any]
Fit a hierarchical clustering model for features X relative to target variable y.
For more information on clustering methods, see
scipy.cluster.hierarchy.linkage().For more information on scipy distance metrics, see
scipy.spatial.distance.pdist().- Parameters:
- X: 2d-array-like
Features to cluster
- y: array-like or None
Target variable
- linkage: str
Defines the method to calculate the distance between clusters. Must be one of “single”, “complete” or “average”.
- metric: str
Scipy distance metric or “xgboost_distances_r2”.
If
xgboost_distances_r2, estimate redundancy distances between features X with respect to target variable y usingshap.utils.xgboost_distances_r2().Otherwise, calculate distances between features using the given distance metric.
If
auto(default), usexgboost_distances_r2if target variable is provided, or elsecosinedistance metric.
- random_state: int or np.random.RandomState
Numpy random state, defaults to 0.
- Returns:
- clustering: np.array
The hierarchical clustering encoded as a linkage matrix.