任务1:模型训练与预测
步骤1:导入LightGBM库
步骤2:使用LGBMClassifier对iris进行训练。
步骤3:将预测的模型对iris进行预测。
1.1 导包
import numpy as np
import pandas as pd
import lightgbm as lgb
import json
from sklearn import datasets
# 读取数据
iris = datasets.load_iris() #载入数据集
# iris
1.2 构建数据集
from sklearn.model_selection import train_test_split
# 将原始数据划分为训练,测试,验证集
train_data_all,test_data,train_y_all,test_y = \
train_test_split(iris.data, iris.target,test_size=0.2,random_state=1,shuffle=True,stratify=iris.target)
train_data,val_data,train_y,val_y = \
train_test_split(train_data_all, train_y_all,test_size=0.2,random_state=1,shuffle=True,stratify=train_y_all)
1.3 训练模型
# 构建数据集
lgb_train = lgb.Dataset(train_data,label=train_y)
lgb_val = lgb.Dataset(val_data,label=val_y)
params={
'learning_rate':0.1,
'lambda_l1':0.1,
'lambda_l2':0.2,
'max_depth':4,
'objective':'multiclass',
'num_class':3, #lightgbm.basic.LightGBMError: b‘Number of classes should be specified and greater than 1 for multiclass training‘
}
# 训练模型
clf=lgb.train(params,lgb_train,valid_sets=[lgb_val])
1.4 模型预测
# 进行预测
from sklearn.metrics import roc_auc_score,accuracy_score
y_pred=clf.predict(test_data)
# y_pred=[list(x).index(max(x)) for x in y_pred]
# print(y_pred)
# print(accuracy_score(y_test,y_pred))
print(y_pred)
[[1.92028726e-04 2.15779127e-03 9.97650180e-01]
[9.96978554e-01 2.88775364e-03 1.33691922e-04]
[3.92278727e-02 9.52641287e-01 8.13084015e-03]
[9.98583095e-01 1.28299744e-03 1.33907086e-04]
[9.74383148e-01 2.48541945e-02 7.62657557e-04]
[9.98583095e-01 1.28299744e-03 1.33907086e-04]
[9.01203243e-04 9.10251290e-04 9.98188545e-01]
[1.92073483e-04 1.92521742e-03 9.97882709e-01]
[1.36080663e-03 3.39086128e-02 9.64730581e-01]
[2.13068261e-03 9.49329566e-01 4.85397510e-02]
[9.73722014e-01 2.55158454e-02 7.62140083e-04]
[1.55923053e-03 9.95022567e-01 3.41820246e-03]
[1.92028726e-04 2.15779127e-03 9.97650180e-01]
[1.80442736e-03 9.93343715e-01 4.85185800e-03]
[6.24614547e-04 6.56567666e-03 9.92809709e-01]
[9.40622432e-01 5.80193844e-02 1.35818314e-03]
[1.92028726e-04 2.15779127e-03 9.97650180e-01]
[1.82435627e-03 9.93326817e-01 4.84882643e-03]
[3.92278727e-02 9.52641287e-01 8.13084015e-03]
[7.48666460e-03 1.45081791e-01 8.47431544e-01]
[1.37496776e-03 9.98119028e-01 5.06004117e-04]
[2.26916234e-03 9.96832784e-01 8.98053599e-04]
[9.98583095e-01 1.28299744e-03 1.33907086e-04]
[9.94373850e-01 5.49280693e-03 1.33342638e-04]
[1.92052404e-04 2.03475217e-03 9.97773195e-01]
[3.38867037e-03 9.91077011e-01 5.53431846e-03]
[9.73722014e-01 2.55158454e-02 7.62140083e-04]
[9.96265410e-01 3.60099334e-03 1.33596291e-04]
[1.59393780e-02 9.81909021e-01 2.15160091e-03]
[1.39970367e-03 9.93759937e-01 4.84035958e-03]]
任务2:模型的保存与加载
步骤1:将任务1训练得到的模型,使用pickle进行保存。
步骤2:将任务1训练得到的模型,使用json进行保存。
步骤3:加载步骤1和步骤2的模型,并进行预测。
2.1 使用pickle保存模型
import pickle
pickle.dump(clf, open('model', 'wb'))
2.2 使用json保存模型
import json
model_json = clf.dump_model()
json.dump(model_json,open('model.json','w'))
2.3 加载模型,进行预测
clf = pickle.load(open('model','rb'))
y_pred=clf.predict(test_data)
y_pred
2.4 另外一种方法
# 保存
clf.save_model('model.txt')
# 加载
clf = lgb.Booster(model_file='model.txt') # 注意这里指定model_file