shap.datasets.california

shap.datasets.california(n_points=None)

Return the California housing data in a structured format.

Parameters:

n_pointsint, optional: Number of data points to sample. If provided, randomly samples the specified number of points.

Returns:

Tuple of pandas DataFrame containing the data and a numpy array representing the target.

The data include the following features:

The target column represents the median house value for California districts.

References

Examples

To get the processed data and target labels:

data, target = shap.datasets.california()