shap.datasets.california
- shap.datasets.california(n_points: int | None = None) tuple[DataFrame, ndarray]
Return the California housing data in a tabular format.
Used in predictive regression tasks.
- Parameters:
- n_pointsint, optional
Number of data points to sample. If provided, randomly samples the specified number of points.
- Returns:
- Xpd.DataFrame
The feature data.
- ynp.ndarray
The target variable.
Notes
The returned feature matrix
Xincludes the following features:MedInc(float): Median income in blockHouseAge(float): Median house age in blockAveRooms(float): Average rooms in dwellingAveBedrms(float): Average bedrooms in dwellingPopulation(float): Block populationAveOccup(float): Average house occupancyLatitude(float): House block latitudeLongitude(float): House block longitude
The target column represents the median house value for California districts.
References
California housing dataset:
sklearn.datasets.fetch_california_housing()Examples
To get the processed data and target labels:
data, target = shap.datasets.california()