shap.datasets.california

shap.datasets.california(n_points=None)

Return the California housing data in a structured format.

Parameters:
n_pointsint, optional

Number of data points to sample. If provided, randomly samples the specified number of points.

Returns:
Tuple of pandas DataFrame containing the data and a numpy array representing the target.

The data include the following features:

  • MedInc : Median income in block

  • HouseAge : Median house age in block

  • AveRooms : Average rooms in dwelling

  • AveBedrms : Average bedrooms in dwelling

  • Population : Block population

  • AveOccup : Average house occupancy

  • Latitude : House block latitude

  • Longitude : House block longitude

The target column represents the median house value for California districts.

References

California housing dataset: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html

Examples

To get the processed data and target labels:

data, target = shap.datasets.california()