【scrapy笔记】-CFANZ编程社区

运行

1.新建run.py 每次运行这个

from scrapy import cmdline
cmdline.execute('scrapy crawl google'.split())

2.命令行

scrapy crawl google

yield scrapy.Request(）不响应

1.allowed_domains = [“xxxxx”] 没写对

allowed_domains = ['play.google.com/store/apps']#对
allowed_domains = ['https://play.google.com/store/apps']#错

2.dont_filter=True 添加（有可能是传入网址过滤掉，dont_filter=True为不过滤）

 yield scrapy.Request(
                url=item['GURL'],
                callback=self.parse_addr_list,
                meta={"item": item},
                dont_filter=True
            )