python爬虫（2）----request模块-CFANZ编程社区

python爬虫（2）----request模块

发送get，post请求，获取响应
response = requests.get(url) #发送get请求，请求url地址对应的响应
response = requests.post(url, data={请求体的字典}) #发送post请求

response的方法
response.text:该方法往往会出现乱码，出现乱码使用response.encoding=“utf-8”
response.content：获取网页的二进制字节流
response.content.decode():把相应的二进制字节流转化为str类型
response.request.url #发送请求的url地址
response.url #response响应的url地址
response.request.headers #请求头
response.headers #响应头

import requests
url = "http://baidu.com"
response = requests.get(url)
response.encoding = "utf-8" #获取网页的html字符串
print(response.text)

#print(response.content)仅仅获取网页内容,二进制字节流，需要解码
print(response.content.decode())

import requests
url = "http://fanyi.baidu.com/basetrans"
query_string = {"query": "你好"，
                "from": "zh",
                "to": "en"}
requests.post(url, data=query_string)
print(response.contend.decode())

获取网页源码的正确打开方式(通过下面三种方式一定可以获取网页的源码)
1.response.content.decode()
2.response.content.decode(“gbk”)
3.response.encoding=“utf-8” , response.text

发送带header的请求
为了模拟浏览器，获取和浏览器一模一样的内容

import requests
url = "http://fanyi.baidu.com/basetrans"
query_string = {"query": "你好"，
                "from": "zh",
                "to": "en"}
headers = {"User-Agent": "...",
           "Referer": "..."}
response = requests.post(url, data=query_string, headers=headers)
# 或 response = requests.get(url, headers=headers)
print(response.content.decode())

使用超时参数
requests.get(url, headers=headers, timeout=3)#3秒之内必须返回响应，否则报错

0 条评论