目录
介绍:
决策树的优点:
Gini Decision Tree
Entropy Decision Tree
entropy决策树和gini决策树的区别
一、数据处理
data=pd.read_csv('iris.csv')
data.info()#类型
'''结果:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Unnamed: 0 150 non-null int64
1 Sepal.Length 150 non-null float64
2 Sepal.Width 150 non-null float64
3 Petal.Length 150 non-null float64
4 Petal.Width 150 non-null float64
5 Species 150 non-null object
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
'''
data['Species'].unique()
#结果:array(['setosa', 'versicolor', 'virginica'], dtype=object)
X=data.drop(['Species'],axis=1)
y=data['Species']
data:
二、建模
from sklearn.model_selection import train_test_split#将数据分成测试和训练集
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)#测试集占百分之三十,random_state=0随机抽取数据集里的成为测试集
from sklearn.tree import DecisionTreeClassifier#决策树的class
cls_gini=DecisionTreeClassifier(criterion='gini',max_depth=3,random_state=0)#gini,设置最深深度为3
cls_entropy=DecisionTreeClassifier(criterion='entropy',random_state=0)#entropy
cls_gini.fit(X_train,y_train)
cls_entropy.fit(X_train,y_train)
y_pred_gini = cls_gini.predict(X_test)
'''结果:
array(['virginica', 'versicolor', 'setosa', 'virginica', 'setosa',
'versicolor', 'setosa', 'versicolor', 'versicolor', 'versicolor',
'virginica', 'versicolor', 'versicolor', 'versicolor',
'versicolor', 'setosa', 'versicolor', 'versicolor', 'setosa',
'setosa', 'virginica', 'versicolor', 'setosa', 'setosa',
'virginica', 'setosa', 'setosa', 'versicolor', 'versicolor',
'setosa', 'virginica', 'versicolor', 'setosa', 'virginica',
'virginica', 'versicolor', 'setosa', 'versicolor', 'versicolor',
'versicolor', 'virginica', 'setosa', 'virginica', 'setosa',
'setosa'], dtype=object)
'''
y_pred_entropy = cls_entropy.predict(X_test)
'''结果:
array(['virginica', 'versicolor', 'setosa', 'virginica', 'setosa',
'versicolor', 'setosa', 'versicolor', 'versicolor', 'versicolor',
'virginica', 'versicolor', 'versicolor', 'versicolor',
'versicolor', 'setosa', 'versicolor', 'versicolor', 'setosa',
'setosa', 'virginica', 'versicolor', 'setosa', 'setosa',
'virginica', 'setosa', 'setosa', 'versicolor', 'versicolor',
'setosa', 'virginica', 'versicolor', 'setosa', 'virginica',
'virginica', 'versicolor', 'setosa', 'versicolor', 'versicolor',
'versicolor', 'virginica', 'setosa', 'virginica', 'setosa',
'setosa'], dtype=object)
'''
三、模型准确度
from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred_gini)
#结果:0.9777777777777777
accuracy_score(y_test,y_pred_entropy)
#结果:0.9777777777777777
print("gini 的 train score:",cls_gini.score(X_train,y_train))
print("gini 的 test score:",cls_gini.score(X_test,y_test))
'''结果:
gini 的 train score: 1.0
gini 的 test score: 0.9777777777777777
'''
print("entropy 的 train score:",cls_entropy.score(X_train,y_train))
print("entropy 的 test score:",cls_entropy.score(X_test,y_test))
'''结果:
entropy 的 train score: 1.0
entropy 的 test score: 0.9777777777777777
'''
四、决策树图型
plt.figure(figsize=(12,8))
tree.plot_tree(cls_gini.fit(X_train,y_train))
plt.figure(figsize=(12,8))
tree.plot_tree(cls_entropy.fit(X_train,y_train))