0
点赞
收藏
分享

微信扫一扫

实现视频爬取, 自动评论, 自动点赞

一、[知识点]:

   动态数据抓包

   requests发送请求

   json数据解析

二、[开发环境]:

   python 3.8               运行代码

   pycharm 2021.2           辅助敲代码

   requests                 pip install requests

三、爬虫案例:

   采集快手短视频网站视频

   分析数据来源

​​    https://www.kuaishou.com/graphql​​

四、实现代码:

   1. 发送请求

   2. 获取数据

   3. 解析数据

   4. 保存数据

爬虫:

   模拟成 浏览器 向 服务器 发送请求

五、完整代码

import requests     # 发送请求 第三方模块
import re

headers = {
'content-type': 'application/json',
# 用户信息
'Cookie': 'kpf=PC_WEB; kpn=KUAISHOU_VISION; clientid=3; did=web_d3f9d8c2cbebafd126b80eb0b1c13360; client_key=65890b29; didv=1658130458000; userId=270932146; kuaishou.server.web_st=ChZrdWFpc2hvdS5zZXJ2ZXIud2ViLnN0EqABCj1Pe61TcGTRmOxDP2F7J-5buR1I6zTbr2o8VylTwBIilBXkjnTbXau3z8OK1r-i0YIefozg8oheW-VO5_33SX0PmlNy5A8bmqSsJXZocyw3CusEfPPuVrgD6zZlzHSqW-M7GKTSptfCJ6of43qs700fYxwy-yrx13---JA62jliXOadl2OOT9f_A7W7DdIhT8rMQtFFdodh_frGf3CyBhoSoJCKbxHIWXjzVWap_gGna5KjIiB6FJHOKt3vnbSSWhl2W0DWrtjoA1X_lW9zlGlRaYHPkSgFMAE; kuaishou.server.web_ph=7ee6499c7437971b1182aa3bb1ba1c645b9f',
# 域名
'Host': 'www.kuaishou.com',
'Origin': 'https://www.kuaishou.com',
# 防盗链
'Referer': 'https://www.kuaishou.com/profile/3xsxdmstbwbx4ba',
# 浏览器基本信息
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/104.0.0.0 Safari/537.36',
}
url = 'https://www.kuaishou.com/graphql'

def get_page(pcursor):
json = {
'operationName': "visionProfilePhotoList",
'query': "fragment photoContent on PhotoEntity {\n id\n duration\n caption\n likeCount\n viewCount\n realLikeCount\n coverUrl\n photoUrl\n photoH265Url\n manifest\n manifestH265\n videoResource\n coverUrls {\n url\n __typename\n }\n timestamp\n expTag\n animatedCoverUrl\n distance\n videoRatio\n liked\n stereoType\n profileUserTopPhoto\n __typename\n}\n\nfragment feedContent on Feed {\n type\n author {\n id\n name\n headerUrl\n following\n headerUrls {\n url\n __typename\n }\n __typename\n }\n photo {\n ...photoContent\n __typename\n }\n canAddComment\n llsid\n status\n currentPcursor\n __typename\n}\n\nquery visionProfilePhotoList($pcursor: String, $userId: String, $page: String, $webPageArea: String) {\n visionProfilePhotoList(pcursor: $pcursor, userId: $userId, page: $page, webPageArea: $webPageArea) {\n result\n llsid\n webPageArea\n feeds {\n ...feedContent\n __typename\n }\n hostName\n pcursor\n __typename\n }\n}\n",
'variables': {'userId': "3xhv7zhkfr3rqag", 'pcursor': pcursor, 'page': "profile"}
}
# 1. 发送请求
response = requests.post(url=url, headers=headers, json=json)
# 2. 获取数据
json_data = response.json()
# 3. 解析数据
feeds = json_data['data']['visionProfilePhotoList']['feeds']
pcursor = json_data['data']['visionProfilePhotoList']['pcursor']
for feed in feeds:
caption = feed['photo']['caption']
photoUrl = feed['photo']['photoUrl']
print(caption, photoUrl)
photoAuthorId = feed['author']['id']
photoId = feed['photo']['id']
json_1 = {
'operationName': "visionVideoLike",
'query': "mutation visionVideoLike($photoId: String, $photoAuthorId: String, $cancel: Int, $expTag: String) {\n visionVideoLike(photoId: $photoId, photoAuthorId: $photoAuthorId, cancel: $cancel, expTag: $expTag) {\n result\n __typename\n }\n}\n",
'variables': {
'cancel': 0,
'expTag': "1_i/2005282647926093489_xpcwebprofilexxnull0",
'photoAuthorId': photoAuthorId,
'photoId': photoId
}
}
requests.post(url=url, headers=headers, json=json_1)
# caption = re.sub('[\\/:"<>|*\\n]', '', caption)
# # 4. 保存数据
# video_data = requests.get(photoUrl).content
# with open(f'video/{caption}.mp4', mode='wb') as f:
# f.write(video_data)
if pcursor == 'no_more':
return 0
get_page(pcursor)

get_page("")

实现视频爬取, 自动评论, 自动点赞_json

举报

相关推荐

自动评论博客文章实现

0 条评论