需求
1.分析saas总门店数量以及分布区域,做一个分布图出来。
2.分析saas总门店数量,查看哪些目前是活跃状态的,以及分布区域。
分析
1.通过运营后台门店列表遍历所有门店信息,主要提取门店名称,门店id和门店地址并存下来;
2.然后做数据分析图,参考下方简易代码,生成城市条形图
import pandas as pd
import matplotlib.pyplot as plt
# 假设 data 是你从爬虫获取到的数据
data = [ #从爬取到的信息里提取类似数据
{'city': '成都'},
{'city': '北京'},
{'city': '成都'},
{'city': '上海'},
# ... 其他数据
]
df = pd.DataFrame(data)
#统计城市数量
city_counts = df['city'].value_counts()
# 设置中文显示(如果需要)
plt.rcParams['font.sans-serif'] = ['SimHei']
# 绘制城市占比图
plt.figure(figsize=(10, 6))
city_counts.plot(kind='bar', color='skyblue')
plt.title('城市占比图')
plt.xlabel('城市')
plt.ylabel('数量')
plt.show()
3.几千条数据,1.查询速度很慢 需要优化 2.门店地址格式不规范,导致地址数据很大 图形绘制出来不规范。
4.实现:
"""
绘制门店城市分布图
"""
import pandas as pd
import matplotlib.pyplot as plt
import requests
from YeePay import Login
#创建实例调用登录
lg =Login.Login()
auth = lg.sys_login()
sys_auth = "Bearer "+ auth
# print(sys_auth)
url = '/business/user/list'
headers = {
'Accept': 'application/json, text/plain, */*',
'Accept-Language': 'zh-CN,zh;q=0.9',
'Authorization':sys_auth,
'Cache-Control': 'no-cache',
'Connection': 'keep-alive',
'Content-Type': 'application/json',
'Origin': 'https://sys-saas.wemew.com',
'Pragma': 'no-cache',
'Referer': 'https://sys-saas.wemew.com/',
'Sec-Fetch-Dest': 'empty',
'Sec-Fetch-Mode': 'cors',
'Sec-Fetch-Site': 'same-site',
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36',
'sec-ch-ua': '"Not_A Brand";v="8", "Chromium";v="120", "Google Chrome";v="120"',
'sec-ch-ua-mobile': '?0',
'sec-ch-ua-platform': '"Windows"',
}
store_list = []
count = 0
#统计2023年至今的门店
deadline = 1672502400
deadline_time = pd.to_datetime(deadline, unit='s') #2022-12-31 16:00:00
for i in range(1,150):
# print('AA')
data = {
"businessType": "",
"payStatus": 0,
"keyword": "",
"pageNum": i,
"pageSize": 50,
"expireStartTime": "",
"expireEndTime": "",
"startTime": "",
"endTime": "",
"customerStatus": "",
"expireStatus": "",
"paymentCycle": "",
"accountType": "",
"payOffStartTime": "",
"payOffEndTime": "",
"pageNo": i
}
response = requests.post(url=url, headers=headers, json=data)
res = response.json()["data"]["rows"][0]["vos"]
# print(response.json())
if res == []:
pass
else:
for j in res:
# ctime = 1706537770
ctime = j["createTime"]
if len(str(ctime)) == 13:
ctime = ctime / 1000
else:
ctime = ctime
ctime_seconds = pd.to_datetime(int(ctime), unit='s')
# print(ctime)
if ctime_seconds > deadline_time:
store_dic = {}
if "测试" not in j["storeName"] and j["marketUser"] is not None: #非测试门店 并且市场人员不能为空
store_dic["storeId"] = j["storeId"]
store_dic["storeName"] = j["storeName"]
#先去province里的地址,取不到再看storeAddress里面的(后续排除掉不规范地址)
if j["province"] is not None and j["province"]!="undefined":
province = j["province"]
count +=1
else:
province = j["storeAddress"].split("省")[0]
store_dic["province"] = province
store_list.append(store_dic)
count += 1
else:
continue
# print(count)
# print(store_list)
# 获取到的总数据列表
data = store_list
df = pd.DataFrame(data)
#统计城市数量
city_counts = df['province'].value_counts()
# 设置中文显示(如果需要)
plt.rcParams['font.sans-serif'] = ['SimHei']
# 绘制城市占比图
plt.figure(figsize=(20, 16))
city_counts.plot(kind='bar', color='skyblue')
plt.title('城市占比图')
plt.xlabel('城市')
plt.ylabel('数量')
plt.show()
#
缺点:
1.这种图数据量大就不太好统计,后续看能否找到合适的图表
2.查询符合条件门店太耗时,后续看能不能同时发起两个请求,互不影响,但需要后续了解去学习