数据保存
# -*- coding: utf-8 -*-
import scrapy
from mySpider.mySpider.items import MyspiderItem
class BooksSpider(scrapy.Spider):
    name = 'books'
    allowed_domains = ['www.books.toscrape.com']
    start_urls = ['http://books.toscrape.com/']
    def parse(self, response):
        for sel in response.css('article.product_pod'):
            book = MyspiderItem()
            book['name'] = sel.xpath('./h3/a/@title').extract_first()
            book['price'] = sel.css('p.price_color::text').extract_first()
            yield book数据处理
# -*- coding: utf-8 -*-
# Define your item pipelines here
#
# Don't forget to add your pipeline to the ITEM_PIPELINES setting
# See: https://doc.scrapy.org/en/latest/topics/item-pipeline.html
class MyspiderPipeline(object):
    # 英镑兑换⼈⺠币汇率
    exchange_rate = 8.5309
    def process_item(self, item, spider):
    # 提取item的price 字段(如£53.74)
    # 去掉前⾯英镑符号£,转换为float 类型,乘以汇率
        price = float(item['price'][1:]) * self.exchange_rate
    # 保留2 位⼩数,赋值回item的price 字段
        item['price'] = '¥%.2f' % price
        return item
        ```
除了必须实现的process_item⽅法外,还有3个比较常⽤的⽅法,可根
据需求选择实现:
● open_spider(self, spider)
Spider打开时(处理数据前)回调该⽅法,通常该⽅法⽤于在开
始处理数据之前完成某些初始化⼯作,如连接数据库。
● close_spider(self, spider)
Spider关闭时(处理数据后)回调该⽅法,通常该⽅法⽤于在处
理完所有数据之后完成某些清理⼯作,如关闭数据库。
● from_crawler(cls, crawler)
创建Item Pipeline对象时回调该类⽅法。通常,在该⽅法中通过
crawler.settings读取配置,根据配置创建Item Pipeline对象。
在后⾯的例⼦中,我们展⽰了以上⽅法的应⽤场景。                
                










