shap.datasets.diabetes

shap.datasets.diabetes(n_points=None)

Return the diabetes data in a nice package.

Parameters:
n_pointsint, optional

Number of data points to sample. If None, the entire dataset is used.

Returns:
Tuple of pandas DataFrame containing the features and a numpy array representing the target.

Feature Columns:

  • age (float): Age in years

  • sex (float): Sex

  • bmi (float): Body mass index

  • bp (float): Average blood pressure

  • s1 (float): Total serum cholesterol

  • s2 (float): Low-density lipoproteins (LDL cholesterol)

  • s3 (float): High-density lipoproteins (HDL cholesterol)

  • s4 (float): Total cholesterol / HDL cholesterol ratio

  • s5 (float): Log of serum triglycerides level

  • s6 (float): Blood sugar level

Target: - Progression of diabetes one year after baseline (float)

Notes

The diabetes dataset is a subset of the larger diabetes dataset from scikit-learn. More details: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_diabetes.html

Examples

To get the processed data and target labels:

data, target = shap.datasets.diabetes()