shap.datasets.adult

shap.datasets.adult(display=False, n_points=None)

Return the Adult census data in a structured format.

Parameters:
displaybool, optional

If True, return the raw data without target and redundant columns.

n_pointsint, optional

Number of data points to sample. If provided, randomly samples the specified number of points.

Returns:
If display is True:

Tuple of pandas DataFrame containing the raw data without the ‘Education’, ‘Target’, and ‘fnlwgt’ columns, and a numpy array representing the ‘Target’ column.

If display is False:

Tuple of pandas DataFrame containing the processed data without the ‘Target’ and ‘fnlwgt’ columns, and a numpy array representing the ‘Target’ column.

The data includes the following columns:
- ``Age`` (float)Age in years.
- ``Workclass`` (category)Type of employment.
- ``fnlwgt`` (float)Final weight; the number of units in the target population that the record represents.
- ``Education`` (category)Highest level of education achieved.
- ``Education-Num`` (float)Numeric representation of education level.
- ``Marital Status`` (category)Marital status of the individual.
- ``Occupation`` (category)Type of occupation.
- ``Relationship`` (category)Relationship status.
- ``Race`` (category)Ethnicity of the individual.
- ``Sex`` (category)Gender of the individual.
- ``Capital Gain`` (float)Capital gains recorded.
- ``Capital Loss`` (float)Capital losses recorded.
- ``Hours per week`` (float)Number of hours worked per week.
- ``Country`` (category)Country of origin.
- ``Target`` (category)Binary target variable indicating whether the individual earns more than 50K.

Notes

  • The ‘Education’ column is redundant with ‘Education-Num’ and is dropped for simplicity.

  • The ‘Target’ column is converted to binary (True/False) where ‘>50K’ is True and ‘<=50K’ is False.

  • Certain categorical columns are encoded for numerical representation.

Examples

To get the processed data and target labels:

data, target = shap.datasets.adult()

To get the raw data for display:

raw_data, target = shap.datasets.adult(display=True)