0
点赞
收藏
分享

微信扫一扫

Python爬虫Cookie的使用场景登录后保存Cookie与加载


cookie的使用

网络部分信息或APP的信息,若是想获取数据时,需要提前做一些操作,往往是需要登录,或者提前访问过某些页面才可以获取到!!

其实底层就是在网页里面增加了Cookie信息

代码

from urllib.request import Request,build_opener
from fake_useragent import UserAgent


url ='https://www.kuaidaili.com/usercenter/overview'
headers = {
  'User-Agent':UserAgent().chrome,
  'Cookie':'channelid=0; sid=1621786217815170; _ga=GA1.2.301996636.1621786363; _gid=GA1.2.699625050.1621786363; Hm_lvt_7ed65b1cc4b810e9fd37959c9bb51b31=1621786363,1621823311; _gat=1; Hm_lpvt_7ed65b1cc4b810e9fd37959c9bb51b31=1621823382; sessionid=48cc80a5da3a451c2fa3ce682d29fde7'
}
req = Request(url,headers= headers)
opener = build_opener()
resp = opener.open(req)


print(resp.read().decode())

登录后保持cookie

Python爬虫Cookie的使用场景登录后保存Cookie与加载_爬虫

问题

不想手动复制cookie,太繁琐了!

解决方案

在再代码中执行登录操作,并保持Cookie不丢失

为了保持Cookie不丢失可以urllib.request.HTTPCookieProcessor来扩展opener的功能

代码

from urllib.request import Request,build_opener
from fake_useragent import UserAgent
from urllib.parse import urlencode
from urllib.request import HTTPCookieProcessor


login_url ='https://www.kuaidaili.com/login/'


args = {
  'username':'398707160@qq.com',
  'passwd':'123456abc'
}
headers = {
  'User-Agent':UserAgent().chrome
}
req = Request(login_url,headers= headers,data = urlencode(args).encode())
# 创建一个可以保存cookie的控制器对象
handler = HTTPCookieProcessor()
# 构造发送请求的对象
opener = build_opener(handler)
#  登录
resp = opener.open(req)


'''
-------------------------上面已经登录好----------------------------------
'''


index_url ='https://www.kuaidaili.com/usercenter/overview'
index_req = Request(index_url,headers =headers)
index_resp = opener.open(index_req)


print(index_resp.read().decode())

cookie的保存与加载

Python爬虫Cookie的使用场景登录后保存Cookie与加载_jar_02

1 原理

Python爬虫Cookie的使用场景登录后保存Cookie与加载_爬虫_03

2 CookieJar

我们可以利用本模块的http.cookiejar.CookieJar类的对象来捕获cookie并在后续连接请求时重新发送,比如可以实现模拟登录功能。该模块主要的对象有CookieJar、FileCookieJar、MozillaCookieJar、LWPCookieJar

from urllib.request import Request,build_opener,HTTPCookieProcessor
from fake_useragent import UserAgent
from urllib.parse import urlencode
from http.cookiejar import MozillaCookieJar

def get_cookie():
  url = 'https://www.kuaidaili.com/login/'
  args = {
    'username':'398707160@qq.com',
    'passwd':'123456abc'
   }
  headers = {'User-Agent':UserAgent().chrome}
  req = Request(url,headers = headers, data = urlencode(args).encode())

  cookie_jar = MozillaCookieJar()
  handler = HTTPCookieProcessor(cookie_jar)
  opener = build_opener(handler)
  resp = opener.open(req)
  # print(resp.read().decode())
  cookie_jar.save('cookie.txt',ignore_discard=True,ignore_expires=True)

def use_cookie():
  url = 'https://www.kuaidaili.com/usercenter/'
  headers = {'User-Agent':UserAgent().chrome}
  req = Request(url,headers = headers)
  
  cookie_jar = MozillaCookieJar()
  cookie_jar.load('cookie.txt',ignore_discard=True,ignore_expires=True)
  handler = HTTPCookieProcessor(cookie_jar)
  opener = build_opener(handler)
  resp = opener.open(req)
  print(resp.read().decode())


if __name__ == '__main__':
  # get_cookie()
  use_cookie()


举报

相关推荐

0 条评论