shap.datasets.california

shap.datasets.california(n_points: int | None = None) tuple[DataFrame, ndarray]

Return the California housing data in a tabular format.

Used in predictive regression tasks.

Parameters:
n_pointsint, optional

Number of data points to sample. If provided, randomly samples the specified number of points.

Returns:
Xpd.DataFrame

The feature data.

ynp.ndarray

The target variable.

Notes

The returned feature matrix X includes the following features:

  • MedInc (float): Median income in block

  • HouseAge (float): Median house age in block

  • AveRooms (float): Average rooms in dwelling

  • AveBedrms (float): Average bedrooms in dwelling

  • Population (float): Block population

  • AveOccup (float): Average house occupancy

  • Latitude (float): House block latitude

  • Longitude (float): House block longitude

The target column represents the median house value for California districts.

References

California housing dataset: sklearn.datasets.fetch_california_housing()

Examples

To get the processed data and target labels:

data, target = shap.datasets.california()