原文:
This is a dataset for binary sentiment classification containing substantially more data than previous benchmark datasets. We provide a set of 25,000 highly polar movie reviews for training, and 25,000 for testing. There is additional unlabeled data for use as well. Raw text and already processed bag of words formats are provided. See the README file contained in the release for more details.
译:
这是一个用于二元情感分类的数据集,它包含的数据比以前的基准数据集多得多。我们提供25000个高度极地电影评论培训,25000测试。还有其他未标记的数据可供使用。提供原始文本和已处理的单词包格式。有关更多详细信息,请参阅版本中包含的自述文件。
大家可以到官网地址下载数据集,我自己也在百度网盘分享了一份。可关注本人公众号,回复“2020092401”获取下载链接。