shap.datasets.adult
- shap.datasets.adult(display: bool = False, n_points: int | None = None) tuple[DataFrame, ndarray]
Return the Adult census data in a structured format.
Used in binary classification tasks.
- Parameters:
- displaybool, optional
If True, return the raw data without target and redundant columns.
- n_pointsint, optional
Number of data points to sample. If provided, randomly samples the specified number of points.
- Returns:
- Xpd.DataFrame
If
displayis True,Xcontains the raw data without the ‘Education’, ‘Target’, and ‘fnlwgt’ columns. Otherwise,Xcontains the processed data without the ‘Target’ and ‘fnlwgt’ columns.- ynp.ndarray
The ‘Target’ column returned as an array.
Notes
The original data includes the following columns:
Age(float) : Age in years.Workclass(category) : Type of employment.fnlwgt(float) : Final weight; the number of units in the target population that the record represents.Education(category) : Highest level of education achieved.Education-Num(float) : Numeric representation of education level.Marital Status(category) : Marital status of the individual.Occupation(category) : Type of occupation.Relationship(category) : Relationship status.Race(category) : Ethnicity of the individual.Sex(category) : Gender of the individual.Capital Gain(float) : Capital gains recorded.Capital Loss(float) : Capital losses recorded.Hours per week(float) : Number of hours worked per week.Country(category) : Country of origin.Target(category) : Binary target variable indicating whether the individual earns more than 50K.
The Education’ column is redundant with ‘Education-Num’ and is dropped for simplicity.
The ‘Target’ column is converted to binary (True/False) where ‘>50K’ is True and ‘<=50K’ is False.
Certain categorical columns are encoded for numerical representation.
Examples
To get the processed data and target labels:
data, target = shap.datasets.adult()
To get the raw data for display:
raw_data, target = shap.datasets.adult(display=True)