shap.datasets.imdb
- shap.datasets.imdb(n_points=None)
Return the classic IMDB sentiment analysis training data in a nice package.
- Parameters:
- n_pointsint, optional
Number of data points to sample. If None, the entire dataset is used.
- Returns:
- Tuple of list containing text data and numpy array representing the labels.
Notes
Full data is at: http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
Paper to cite when using the data is: http://www.aclweb.org/anthology/P11-1015
Examples
To get the processed text data and labels:
text_data, labels = shap.datasets.imdb()