0
点赞
收藏
分享

微信扫一扫

Decision Tree建模with Gini and Entropy

目录

介绍: 

决策树的优点:

Gini Decision Tree 

Entropy Decision Tree 

entropy决策树和gini决策树的区别

 一、数据处理

二、建模

 三、模型准确度

四、决策树图型

介绍: 

决策树的优点:

Gini Decision Tree 

Entropy Decision Tree 

entropy决策树和gini决策树的区别

 一、数据处理

data=pd.read_csv('iris.csv')

data.info()#类型
'''结果:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 6 columns):
 #   Column        Non-Null Count  Dtype  
---  ------        --------------  -----  
 0   Unnamed: 0    150 non-null    int64  
 1   Sepal.Length  150 non-null    float64
 2   Sepal.Width   150 non-null    float64
 3   Petal.Length  150 non-null    float64
 4   Petal.Width   150 non-null    float64
 5   Species       150 non-null    object 
dtypes: float64(4), int64(1), object(1)
memory usage: 7.2+ KB
'''

data['Species'].unique()
#结果:array(['setosa', 'versicolor', 'virginica'], dtype=object)

X=data.drop(['Species'],axis=1)
y=data['Species']

data:

二、建模

from  sklearn.model_selection import train_test_split#将数据分成测试和训练集
X_train,X_test,y_train,y_test=train_test_split(X,y,test_size=0.3,random_state=0)#测试集占百分之三十,random_state=0随机抽取数据集里的成为测试集

from sklearn.tree import DecisionTreeClassifier#决策树的class
cls_gini=DecisionTreeClassifier(criterion='gini',max_depth=3,random_state=0)#gini,设置最深深度为3
cls_entropy=DecisionTreeClassifier(criterion='entropy',random_state=0)#entropy

cls_gini.fit(X_train,y_train)
cls_entropy.fit(X_train,y_train)

y_pred_gini = cls_gini.predict(X_test)
'''结果:
array(['virginica', 'versicolor', 'setosa', 'virginica', 'setosa',
       'versicolor', 'setosa', 'versicolor', 'versicolor', 'versicolor',
       'virginica', 'versicolor', 'versicolor', 'versicolor',
       'versicolor', 'setosa', 'versicolor', 'versicolor', 'setosa',
       'setosa', 'virginica', 'versicolor', 'setosa', 'setosa',
       'virginica', 'setosa', 'setosa', 'versicolor', 'versicolor',
       'setosa', 'virginica', 'versicolor', 'setosa', 'virginica',
       'virginica', 'versicolor', 'setosa', 'versicolor', 'versicolor',
       'versicolor', 'virginica', 'setosa', 'virginica', 'setosa',
       'setosa'], dtype=object)
'''

y_pred_entropy = cls_entropy.predict(X_test)
'''结果:
array(['virginica', 'versicolor', 'setosa', 'virginica', 'setosa',
       'versicolor', 'setosa', 'versicolor', 'versicolor', 'versicolor',
       'virginica', 'versicolor', 'versicolor', 'versicolor',
       'versicolor', 'setosa', 'versicolor', 'versicolor', 'setosa',
       'setosa', 'virginica', 'versicolor', 'setosa', 'setosa',
       'virginica', 'setosa', 'setosa', 'versicolor', 'versicolor',
       'setosa', 'virginica', 'versicolor', 'setosa', 'virginica',
       'virginica', 'versicolor', 'setosa', 'versicolor', 'versicolor',
       'versicolor', 'virginica', 'setosa', 'virginica', 'setosa',
       'setosa'], dtype=object)
'''

 三、模型准确度

from sklearn.metrics import accuracy_score
accuracy_score(y_test,y_pred_gini)
#结果:0.9777777777777777

accuracy_score(y_test,y_pred_entropy)
#结果:0.9777777777777777

print("gini 的 train score:",cls_gini.score(X_train,y_train))
print("gini 的 test score:",cls_gini.score(X_test,y_test))
'''结果:
gini 的 train score: 1.0
gini 的 test score: 0.9777777777777777
'''

print("entropy 的 train score:",cls_entropy.score(X_train,y_train))
print("entropy 的 test score:",cls_entropy.score(X_test,y_test))
'''结果:
entropy 的 train score: 1.0
entropy 的 test score: 0.9777777777777777
'''

四、决策树图型

plt.figure(figsize=(12,8))
tree.plot_tree(cls_gini.fit(X_train,y_train))

plt.figure(figsize=(12,8))
tree.plot_tree(cls_entropy.fit(X_train,y_train))

 

举报

相关推荐

0 条评论