变量之间的相关性分析主要包括:
- 分析变量自身的规律
- 自相关分析
- 偏相关分析
- 分析任意两个等长数列之间的相关性
- 简单相关分析
- 允许在一定的间隔下进行简单的相关分析
- 互相关分析
- 分析两组变量的相关性
- 典型的相关分析
相关图的绘制
一、相关矩阵图
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn import datasets
import seaborn as sns
iris = datasets.load_iris()
iris_data = pd.DataFrame(iris.data, columns=iris.feature_names)
iris_data['species'] = iris.target_names[iris.target]
df = iris_data.drop(columns='species')
corr = df.corr()
corrplot(corr, cmap='Spectral', s=2000)
pd.set_option('display.max_columns', None)
pd.set_option('display.max_rows', None)
print('corr: \n', corr)
corrplot
函数
def corrplot(corr, cmap, s):
import matplotlib.pyplot as plt
x, y, z = [], [], []
N = corr.shape[0]
for row in range(N):
for column in range(N):
x.append(row)
y.append(N - 1 - column)
z.append(round(corr.iloc[row, column], 2))
sc = plt.scatter(x, y, c=z, vmin=-1, vmax=1, s=s * np.absolute(z), cmap=plt.cm.get_cmap(cmap))
plt.colorbar(sc)
plt.xlim((-0.5, N - 0.5))
plt.ylim((-0.5, N - 0.5))
plt.xticks(range(N), corr.columns, rotation=90)
plt.yticks(range(N)[::-1], corr.columns)
plt.grid(False)
ax = plt.gca()
ax.xaxis.set_ticks_position('top')
internal_space = [0.5 + k for k in range(4)]
[plt.plot([m, m], [-.05, N - 0.5], c='lightgray') for m in internal_space]
[plt.plot([-.05, N - 0.5], [m, m], c='lightgray') for m in internal_space]
plt.show()
iris 数据集
sepal length (cm) sepal width (cm) ... petal width (cm) species
0 5.1 3.5 ... 0.2 setosa
1 4.9 3.0 ... 0.2 setosa
2 4.7 3.2 ... 0.2 setosa
3 4.6 3.1 ... 0.2 setosa
4 5.0 3.6 ... 0.2 setosa
.. ... ... ... ... ...
145 6.7 3.0 ... 2.3 virginica
146 6.3 2.5 ... 1.9 virginica
147 6.5 3.0 ... 2.0 virginica
148 6.2 3.4 ... 2.3 virginica
149 5.9 3.0 ... 1.8 virginica
计算相关系数矩阵
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
sepal length (cm) 1.000000 -0.117570 0.871754 0.817941
sepal width (cm) -0.117570 1.000000 -0.428440 -0.366126
petal length (cm) 0.871754 -0.428440 1.000000 0.962865
petal width (cm) 0.817941 -0.366126 0.962865 1.000000
二、相关层次图
import numpy as np
import pandas as pd
mtcars = pd.read_csv('data/mtcars.csv', index_col=0)
print(mtcars)
d = np.sqrt(1 - mtcars.corr() * mtcars.corr())
d.fillna(0,inplace=True)
print(d)
d.dropna()
from scipy.spatial.distance import pdist, squareform
from scipy.cluster.hierarchy import linkage
from scipy.cluster.hierarchy import dendrogram
row_cluster = linkage(pdist(d, metric='euclidean'), method='ward')
row_dendr = dendrogram(row_cluster, labels=d.index)
plt.tight_layout()
plt.ylabel('Euclidean distance')
plt.plot([0, 2000], [1.5, 1.5], c='gray', linestyle='--')
plt.show()
mtcars.csv
"","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
"Valiant",18.1,6,225,105,2.76,3.46,20.22,1,0,3,1
"Duster 360",14.3,8,360,245,3.21,3.57,15.84,0,0,3,4
"Merc 240D",24.4,4,146.7,62,3.69,3.19,20,1,0,4,2
"Merc 230",22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2
"Merc 280",19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4
"Merc 280C",17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4
"Merc 450SE",16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3
"Merc 450SL",17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3
"Merc 450SLC",15.2,8,275.8,180,3.07,3.78,18,0,0,3,3
"Cadillac Fleetwood",10.4,8,472,205,2.93,5.25,17.98,0,0,3,4
"Lincoln Continental",10.4,8,460,215,3,5.424,17.82,0,0,3,4
"Chrysler Imperial",14.7,8,440,230,3.23,5.345,17.42,0,0,3,4
"Fiat 128",32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1
"Honda Civic",30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2
"Toyota Corolla",33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1
"Toyota Corona",21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1
"Dodge Challenger",15.5,8,318,150,2.76,3.52,16.87,0,0,3,2
"AMC Javelin",15.2,8,304,150,3.15,3.435,17.3,0,0,3,2
"Camaro Z28",13.3,8,350,245,3.73,3.84,15.41,0,0,3,4
"Pontiac Firebird",19.2,8,400,175,3.08,3.845,17.05,0,0,3,2
"Fiat X1-9",27.3,4,79,66,4.08,1.935,18.9,1,1,4,1
"Porsche 914-2",26,4,120.3,91,4.43,2.14,16.7,0,1,5,2
"Lotus Europa",30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2
"Ford Pantera L",15.8,8,351,264,4.22,3.17,14.5,0,1,5,4
"Ferrari Dino",19.7,6,145,175,3.62,2.77,15.5,0,1,5,6
"Maserati Bora",15,8,301,335,3.54,3.57,14.6,0,1,5,8
"Volvo 142E",21.4,4,121,109,4.11,2.78,18.6,1,1,4,2
mtcars数据集读取结果:
mpg cyl disp hp drat ... qsec vs am gear carb
Mazda RX4 21.0 6 160.0 110 3.90 ... 16.46 0 1 4 4
Mazda RX4 Wag 21.0 6 160.0 110 3.90 ... 17.02 0 1 4 4
Datsun 710 22.8 4 108.0 93 3.85 ... 18.61 1 1 4 1
Hornet 4 Drive 21.4 6 258.0 110 3.08 ... 19.44 1 0 3 1
Hornet Sportabout 18.7 8 360.0 175 3.15 ... 17.02 0 0 3 2
Valiant 18.1 6 225.0 105 2.76 ... 20.22 1 0 3 1
Duster 360 14.3 8 360.0 245 3.21 ... 15.84 0 0 3 4
Merc 240D 24.4 4 146.7 62 3.69 ... 20.00 1 0 4 2
Merc 230 22.8 4 140.8 95 3.92 ... 22.90 1 0 4 2
Merc 280 19.2 6 167.6 123 3.92 ... 18.30 1 0 4 4
Merc 280C 17.8 6 167.6 123 3.92 ... 18.90 1 0 4 4
Merc 450SE 16.4 8 275.8 180 3.07 ... 17.40 0 0 3 3
Merc 450SL 17.3 8 275.8 180 3.07 ... 17.60 0 0 3 3
Merc 450SLC 15.2 8 275.8 180 3.07 ... 18.00 0 0 3 3
Cadillac Fleetwood 10.4 8 472.0 205 2.93 ... 17.98 0 0 3 4
Lincoln Continental 10.4 8 460.0 215 3.00 ... 17.82 0 0 3 4
Chrysler Imperial 14.7 8 440.0 230 3.23 ... 17.42 0 0 3 4
Fiat 128 32.4 4 78.7 66 4.08 ... 19.47 1 1 4 1
Honda Civic 30.4 4 75.7 52 4.93 ... 18.52 1 1 4 2
Toyota Corolla 33.9 4 71.1 65 4.22 ... 19.90 1 1 4 1
Toyota Corona 21.5 4 120.1 97 3.70 ... 20.01 1 0 3 1
Dodge Challenger 15.5 8 318.0 150 2.76 ... 16.87 0 0 3 2
AMC Javelin 15.2 8 304.0 150 3.15 ... 17.30 0 0 3 2
Camaro Z28 13.3 8 350.0 245 3.73 ... 15.41 0 0 3 4
Pontiac Firebird 19.2 8 400.0 175 3.08 ... 17.05 0 0 3 2
Fiat X1-9 27.3 4 79.0 66 4.08 ... 18.90 1 1 4 1
Porsche 914-2 26.0 4 120.3 91 4.43 ... 16.70 0 1 5 2
Lotus Europa 30.4 4 95.1 113 3.77 ... 16.90 1 1 5 2
Ford Pantera L 15.8 8 351.0 264 4.22 ... 14.50 0 1 5 4
Ferrari Dino 19.7 6 145.0 175 3.62 ... 15.50 0 1 5 6
Maserati Bora 15.0 8 301.0 335 3.54 ... 14.60 0 1 5 8
Volvo 142E 21.4 4 121.0 109 4.11 ... 18.60 1 1 4 2
计算获得相关系数矩阵
mpg cyl disp ... am gear carb
mpg 0.000000 0.523278 5.307133e-01 ... 0.800126 0.877113 0.834555
cyl 0.523278 0.000000 4.316673e-01 ... 0.852574 0.870207 0.849873
disp 0.530713 0.431667 2.107342e-08 ... 0.806505 0.831470 0.918691
hp 0.630526 0.554104 6.118826e-01 ... 0.969975 0.992068 0.661650
drat 0.732124 0.714203 7.039859e-01 ... 0.701458 0.714525 0.995870
wt 0.497159 0.622656 4.598822e-01 ... 0.721422 0.812266 0.903965
qsec 0.908132 0.806494 9.010583e-01 ... 0.973224 0.977121 0.754544
vs 0.747698 0.585307 7.037821e-01 ... 0.985728 0.978547 0.821917
am 0.800126 0.852574 8.065052e-01 ... 0.000000 0.607841 0.998344
gear 0.877113 0.870207 8.314703e-01 ... 0.607841 0.000000 0.961709
carb 0.834555 0.849873 9.186911e-01 ... 0.998344 0.961709 0.000000
相关层次图