0
点赞
收藏
分享

微信扫一扫

解决spacy3.2报错:Can‘t find model ‘en‘.


(1)下载spacy一直没成功,把​​pip install spacy​​​改成​​conda install spacy​​​就可以了;
(2)在命令行输入 ​​​python3 -m spacy download en​​​ 来下载英语语言包(如果是其他语言则下载其他包了),不过​​en​​​现在最好用全称​​en_core_web_sm​​​,这一步也可以先下载tar再​​pip install en_core_web_md-2.2.5.tar.gz​​(但是注意把文件放对路径)。

然后测试下代码:

import spacy
import nltk

# load spacy's English-language models
en_nlp = spacy.load('en')
# instantiate nltk's Porter stemmer
stemmer = nltk.stem.PorterStemmer()

# define function to compare lemmatization in spacy with stemming in nltk
def compare_normalization(doc):
# tokenize document in spacy
doc_spacy = en_nlp(doc)
# print lemmas found by spacy
print("Lemmatization:")
print([token.lemma_ for token in doc_spacy])
# print tokens found by Porter stemmer
print("Stemming:")
print([stemmer.stem(token.norm_.lower()) for token in doc_spacy])

发现又报错:

OSError: [E941] Can't find model 'en'. 
It looks like you're trying to load a model from a shortcut,
which is obsolete as of spaCy v3.0.
To load the model, use its full name instead:

nlp = spacy.load("en_core_web_sm")

For more details on the available models, see the models directory:
https://spacy.io/models.
If you want to create a blank model, use spacy.blank: nlp = spacy.blank("en")

是说上面load model的方法是spacy 3.0版本以前才这么用的,要改成​​nlp = spacy.load("en_core_web_sm")​​,然后就ok了,得到对应的spacy中的词形还原与nltk中的词干提取的对比结果:

Lemmatization:
['our', 'meeting', 'today', 'be', 'bad', 'than', 'yesterday', ',', 'I', 'be', 'scared', 'of', 'meet', 'the', 'client', 'tomorrow', '.']
Stemming:
['our', 'meet', 'today', 'wa', 'wors', 'than', 'yesterday', ',', 'i', 'am', 'scare', 'of', 'meet', 'the', 'client', 'tomorrow', '.']


举报

相关推荐

0 条评论