问题描述
在爬虫获取微博热搜数据的时候,response中文出现了不便于理解的字段,截取如下:
......[{"title_sub":"\u7814\u7a76\u8bc1\u5b9e\u559d\u5496\u5561\u80fd\u964d\u4f4e\u75db\u98ce\u98ce\u9669","item_log":{"key":"#\u7814\u7a76\u8bc1\u5b9e\u559d\u5496\u5561\u80fd\u964d\u4f4e\u75db\u98ce\u98ce\u9669#"}
解决方法
引入json
模块,在拿数据的时候用json.loads
处理下就ok了;
demo_code:
import requests
import json
url = "https://m.weibo.cn/api/container/getIndex"
querystring = {"containerid":"231583","page_type":"searchall"}
headers = {
'sec-ch-ua': "\" Not A;Brand\";v=\"99\", \"Chromium\";v=\"100\", \"Google Chrome\";v=\"100\"",
'x-xsrf-token': "99c11b",
'sec-ch-ua-mobile': "?0",
'user-agent': "Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/100.0.4896.60 Safari/537.36",
'accept': "application/json, text/plain, */*",
'mweibo-pwa': "1",
'x-requested-with': "XMLHttpRequest",
'sec-ch-ua-platform': "\"Windows\"",
'sec-fetch-site': "same-origin",
'sec-fetch-mode': "cors",
'sec-fetch-dest': "empty",
}
response = requests.request("GET", url, headers=headers, params=querystring)
data = json.loads(response.content)
print(data)
返回结果
over~~~