0
点赞
收藏
分享

微信扫一扫

python的sklearn分析酒店评分影响因素


酒店的评价不外乎设施,位置便利性,卫生和服务质量几个因素,我从数据超市下载一个数据集,将其中几个评分清洗出来

截图如下

python的sklearn分析酒店评分影响因素_数据集

#多元回归 分析获得客户评价对推荐人数的影响
import pandas as pd
import statsmodels.api as sm
from sklearn.linear_model import LinearRegression
filename = "../../data/各项评分.xls"
data = pd.read_excel(filename)
print(data.describe())
x = data[['location', 'service', 'fac', 'health']]
y = data[['comment_recommend']]
regr = LinearRegression()
regr.fit(x, y)
print('各项系数'+str(regr.coef_))
print('常数项'+str(regr.intercept_))
x2 = sm.add_constant(x)
est = sm.OLS(y, x2).fit()
print(est.summary())

多元回归和拟合结果如下

python的sklearn分析酒店评分影响因素_数据_02


用决策树判断一下几个因素对酒店评分影响比例

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_curve
filename = "../data/各项评分.xls"
data = pd.read_excel(filename)
#推荐率小于92%视为不合格的酒店
data.loc[data.comment_recommend < 0.92, 'comment_recommend'] = 0
data.loc[data.comment_recommend >= 0.92, 'comment_recommend'] = 1
x = data.drop(columns='comment_recommend')
y = data['comment_recommend']
x_train, x_test,y_train, y_test= train_test_split(x, y, test_size=0.2, random_state=1)
model = DecisionTreeClassifier(max_depth=4, random_state=1)
model.fit(x_train, y_train)
y_pred = model.predict(x_test)
a = pd.DataFrame()
a['预测值'] = list(y_pred)
a['实际值'] = list(y_test)
print(a)
#准确度
score = accuracy_score(y_pred, y_test)
print(score)
#预测推荐和不推荐
y_pred_proba = model.predict_proba(x_test)
b = pd.DataFrame(y_pred_proba, columns=['不推荐', '推荐'])
print(b)
#模型评估
fpr, tpr, thres = roc_curve(y_test, y_pred_proba[:, 1])
a = pd.DataFrame()
a['阈值'] = list(thres)
a['假报警'] = list(fpr)
a['命中率'] = list(tpr)
print(a)
#获取特征重要性
fea = x.columns
importances = model.feature_importances_
#以二维表格显示
importances_df = pd.DataFrame()
importances_df['特征名称'] = fea
importances_df['特征重要性'] = importances
importances_df.sort_values('特征重要性', ascending=False)
print(importances_df)

python的sklearn分析酒店评分影响因素_数据_03


可能是数据集不够,结果不太正确

外附加数据超市地址

http://www.data-shop.net/


举报

相关推荐

0 条评论