shap.datasets.adult
- shap.datasets.adult(display=False, n_points=None)
Return the Adult census data in a structured format.
- Parameters:
- displaybool, optional
If True, return the raw data without target and redundant columns.
- n_pointsint, optional
Number of data points to sample. If provided, randomly samples the specified number of points.
- Returns:
- If display is True:
Tuple of pandas DataFrame containing the raw data without the ‘Education’, ‘Target’, and ‘fnlwgt’ columns, and a numpy array representing the ‘Target’ column.
- If display is False:
Tuple of pandas DataFrame containing the processed data without the ‘Target’ and ‘fnlwgt’ columns, and a numpy array representing the ‘Target’ column.
- The data includes the following columns:
- - ``Age`` (float)Age in years.
- - ``Workclass`` (category)Type of employment.
- - ``fnlwgt`` (float)Final weight; the number of units in the target population that the record represents.
- - ``Education`` (category)Highest level of education achieved.
- - ``Education-Num`` (float)Numeric representation of education level.
- - ``Marital Status`` (category)Marital status of the individual.
- - ``Occupation`` (category)Type of occupation.
- - ``Relationship`` (category)Relationship status.
- - ``Race`` (category)Ethnicity of the individual.
- - ``Sex`` (category)Gender of the individual.
- - ``Capital Gain`` (float)Capital gains recorded.
- - ``Capital Loss`` (float)Capital losses recorded.
- - ``Hours per week`` (float)Number of hours worked per week.
- - ``Country`` (category)Country of origin.
- - ``Target`` (category)Binary target variable indicating whether the individual earns more than 50K.
Notes
The ‘Education’ column is redundant with ‘Education-Num’ and is dropped for simplicity.
The ‘Target’ column is converted to binary (True/False) where ‘>50K’ is True and ‘<=50K’ is False.
Certain categorical columns are encoded for numerical representation.
Examples
To get the processed data and target labels:
data, target = shap.datasets.adult()
To get the raw data for display:
raw_data, target = shap.datasets.adult(display=True)