为什么要使用httpx
requests模块不支持http2.0协议, 在访问使用http2.0协议的网站时, 就需要用到httpx
# 使用requests模块访问http2.0的网站, 会报错
import requests
url = 'https://spa16.scrape.center/'
headers = {
'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/98.0.4758.102 Safari/537.36'
}
resp = requests.get(url=url, headers=headers)
print(resp.text)
"""
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None))
"""
http协议版本可以在抓包工具中查看(浏览器开发者工具的network页面)
安装
pip install httpx[http2] # 后面要加[http2]否则不能支持https2.0
功能及用法
httpx的绝大多数API与requests相同, 一些区别和独特的用法如下:
开启对http2.0的支持
# 默认情况下, httpx没有开启对http2.0的支持
# 要开启对http2.0的支持, 需要这样
import httpx
url = 'https://spa16.scrape.center/'
# 参数http2设置为True
# 这里Client对象的作用类似于requests的Session对象
with httpx.Client(http2=True) as client:
resp = client.get(url)
print(resp.text)
查看http协议版本
import httpx
url = 'https://spa16.scrape.center/'
with httpx.Client(http2=True) as client:
resp = client.get(url)
# 响应对象的http_version属性是所使用的协议版本
print(resp.http_version) # HTTP/2
支持异步
import httpx
import asyncio
async def scrape_main(url):
# 使用AsyncClient支持异步
async with httpx.AsyncClient(http2=True) as client:
resp = await client.get(url)
print(resp.text)
if __name__ == '__main__':
main_url = 'https://spa16.scrape.center/'
asyncio.run(scrape_main(main_url))