爬虫中的正则表达式
操作步骤
-
指定url
-
发出请求
-
获取响应数据
-
数据解析
-
持久化存储
实例
要求:爬取豆瓣电影前25部电影的评价
import requests import re headers = { "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/94.0.4606.81 Safari/537.36 Edg/94.0.992.50" } url = "https://movie.douban.com/top250" response = requests.post(url=url, headers=headers).text x = '<p class="quote">.*?<span class="inq">(.*?)</span>.*?</p>' list1 = re.findall(x, response, re.S) with open ("电影评分","w",encoding="utf-8") as fp: str1="" for i in range(len(list1)): str1 = str1+str(list1[i]) fp.write(str1)