shap.utils.hclust
- shap.utils.hclust(X: _ArrayLike, y: _ArrayLike | None = None, linkage: Literal['single', 'complete', 'average'] = 'single', metric: str = 'auto', random_state: int | np.random.RandomState = 0) np.ndarray
Fit a hierarchical clustering model for features X relative to target variable y.
For more information on clustering methods, see
scipy.cluster.hierarchy.linkage()
.For more information on scipy distance metrics, see
scipy.spatial.distance.pdist()
.- Parameters:
- X: 2d-array-like
Features to cluster
- y: array-like or None
Target variable
- linkage: str
Defines the method to calculate the distance between clusters. Must be one of “single”, “complete” or “average”.
- metric: str
Scipy distance metric or “xgboost_distances_r2”.
If
xgboost_distances_r2
, estimate redundancy distances between features X with respect to target variable y usingshap.utils.xgboost_distances_r2()
.Otherwise, calculate distances between features using the given distance metric.
If
auto
(default), usexgboost_distances_r2
if target variable is provided, or elsecosine
distance metric.
- random_state: int or np.random.RandomState
Numpy random state, defaults to 0.
- Returns:
- clustering: np.array
The hierarchical clustering encoded as a linkage matrix.