shap.datasets.imdb

shap.datasets.imdb(n_points: int | None = None) → tuple[list[str], ndarray]

Return the classic IMDB sentiment analysis training data in a nice package.

Used in binary text classification tasks.

Parameters:

n_pointsint, optional: Number of data points to sample. If provided, randomly samples the specified number of points.

Returns:

Xlist of strings: Text data, where each string is a movie review.
ynp.ndarray: The target variable. Contains booleans, where True indicates a positive sentiment and False indicates a negative sentiment.

Notes

Examples

To get the processed text data and labels:

text_data, labels = shap.datasets.imdb()