shap.datasets.california
- shap.datasets.california(n_points=None)
Return the California housing data in a structured format.
- Parameters:
- n_pointsint, optional
Number of data points to sample. If provided, randomly samples the specified number of points.
- Returns:
- Tuple of pandas DataFrame containing the data and a numpy array representing the target.
The data include the following features:
MedInc
: Median income in blockHouseAge
: Median house age in blockAveRooms
: Average rooms in dwellingAveBedrms
: Average bedrooms in dwellingPopulation
: Block populationAveOccup
: Average house occupancyLatitude
: House block latitudeLongitude
: House block longitude
The target column represents the median house value for California districts.
References
California housing dataset: https://scikit-learn.org/stable/modules/generated/sklearn.datasets.fetch_california_housing.html
Examples
To get the processed data and target labels:
data, target = shap.datasets.california()