应朋友要求,抓取某公开页面上的信息,做一下记录。代码比较简单,做一个备忘录。主要是涉及到了payload类型。
里边隐去了user-aggent和url,不便展示。其实里边就一个难点,payload数据抓取,原来因为没有用过,而且对前端了解比较少,现在的项目基本都是前后端分离,所以前后端联调的机会比较少。而且后期用到的机会也比较少,所以记录一下,防止下一次用到。
import requests
import json
import time
def get_phonenum(num):
file = open('./value.txt','a+',encoding='utf-8')
url = ''
headers = {
'Host': 'holmes.taobao.com',
'Connection': 'keep-alive',
'Content-Length': '61',
'sec-ch-ua': '" Not A;Brand";v="99", "Chromium";v="99", "Google Chrome";v="99"',
'Accept': 'application/json, text/plain',
'Content-Type': 'application/json',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
'Origin': 'https://www.dingtalk.com',
'Sec-Fetch-Site': 'cross-site',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Dest': 'empty',
'Referer': 'https://www.dingtalk.com/',
'Accept-Encoding': 'gzip, deflate, br',
'Accept-Language': 'zh-CN,zh;q=0.9'
}
payload = {
"pageNo": int(num),
"pageSize": 100,
"keyword": "工程",
"orderByType": 5
}
data = requests.post(url=url,data=json.dumps(payload),headers=headers).json()['data']['data']
for i in range(len(data)):
# column_name= str(list(data[i].keys())).replace('[','').replace(']','').replace("'",'').replace(',','\t')
# file.write(str(column_name.encode('utf-8').decode('utf-8'))+'\n')
# print(column_name)
values = str(list(data[i].values())).replace('[','').replace(']','').replace("'",'').replace(',','\t')
# print(values)
file.write(str(values.encode('utf-8').decode('utf-8'))+'\n')
# print('\r正在输出第%d行数据'%i,end='')
if __name__ == '__main__':
# get_cpu()
for i in range(1,1100):
get_phonenum(i)
print('\r正在抓取第%d页数据'%i,end='')
time.sleep(1)
获取接口数据,尽量从正规渠道拿,不要想着通过接口深入到别人的库里边去,不然会有许多的麻烦的。









