shap.datasets.california
- shap.datasets.california(n_points: int | None = None) tuple[DataFrame, ndarray]
Return the California housing data in a tabular format.
Used in predictive regression tasks.
- Parameters:
- n_pointsint, optional
Number of data points to sample. If provided, randomly samples the specified number of points.
- Returns:
- Xpd.DataFrame
The feature data.
- ynp.ndarray
The target variable.
Notes
The returned feature matrix
X
includes the following features:MedInc
(float): Median income in blockHouseAge
(float): Median house age in blockAveRooms
(float): Average rooms in dwellingAveBedrms
(float): Average bedrooms in dwellingPopulation
(float): Block populationAveOccup
(float): Average house occupancyLatitude
(float): House block latitudeLongitude
(float): House block longitude
The target column represents the median house value for California districts.
References
California housing dataset:
sklearn.datasets.fetch_california_housing()
Examples
To get the processed data and target labels:
data, target = shap.datasets.california()