shap.datasets.diabetes

shap.datasets.diabetes(n_points: int | None = None) tuple[DataFrame, ndarray]

Return the diabetes data in a nice package.

Used in predictive regression tasks.

Parameters:
n_pointsint, optional

Number of data points to sample. If provided, randomly samples the specified number of points.

Returns:
Xpd.DataFrame

The feature data.

ynp.ndarray

The target variable.

Notes

Feature Columns in X:

  • age (float): Age in years

  • sex (float): Sex

  • bmi (float): Body mass index

  • bp (float): Average blood pressure

  • s1 (float): Total serum cholesterol

  • s2 (float): Low-density lipoproteins (LDL cholesterol)

  • s3 (float): High-density lipoproteins (HDL cholesterol)

  • s4 (float): Total cholesterol / HDL cholesterol ratio

  • s5 (float): Log of serum triglycerides level

  • s6 (float): Blood sugar level

Target y:

  • Progression of diabetes one year after baseline (float)

The diabetes dataset is a subset of the larger diabetes dataset from scikit-learn. More details: sklearn.datasets.load_diabetes()

Examples

To get the processed data and target labels:

data, target = shap.datasets.diabetes()