shap.datasets.imdb
- shap.datasets.imdb(n_points: int | None = None) tuple[list[str], ndarray]
Return the classic IMDB sentiment analysis training data in a nice package.
Used in binary text classification tasks.
- Parameters:
- n_pointsint, optional
Number of data points to sample. If provided, randomly samples the specified number of points.
- Returns:
- Xlist of strings
Text data, where each string is a movie review.
- ynp.ndarray
The target variable. Contains booleans, where True indicates a positive sentiment and False indicates a negative sentiment.
Notes
Full data is at: http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Paper to cite when using the data is: http://www.aclweb.org/anthology/P11-1015
Examples
To get the processed text data and labels:
text_data, labels = shap.datasets.imdb()