0
点赞
收藏
分享

微信扫一扫

【python数据分析(9)】Pandas数据结构Dataframe:数据查看、转置 、添加/修改、删除值 / 对齐 / 排序


1. 数据查看

​.head()​​查看头部数据

​.tail()​​查看尾部数据

默认查看5条

df = pd.DataFrame(np.random.rand(16).reshape(8,2)*100,
columns = ['a','b'])
print(df.head(2))
print(df.tail())

–> 输出的结果为:

0  80.800250  97.333282
1 91.433429 81.323805

a b
3 3.655392 81.143852
4 70.394713 52.598872
5 62.170747 73.813017
6 40.934632 7.242002
7 75.889400 84.418156

2. 数据转置

print(df.T)

–> 输出的结果为:

0          1          2  ...          5          6          7
a 80.800250 91.433429 5.563492 ... 62.170747 40.934632 75.889400
b 97.333282 81.323805 10.411445 ... 73.813017 7.242002 84.418156

[2 rows x 8 columns]

3. 添加与修改

3.1新增列/行并赋值

df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df)

df['e'] = 10
df.loc[4] = 20
print(df)

–> 输出的结果为:

0  14.552288  22.852489  50.584815  31.153962
1 91.475232 27.827945 98.790335 74.487188
2 94.963093 5.227859 33.461076 71.792757
3 52.321047 77.474292 0.497665 7.623358

a b c d e
0 14.552288 22.852489 50.584815 31.153962 10
1 91.475232 27.827945 98.790335 74.487188 10
2 94.963093 5.227859 33.461076 71.792757 10
3 52.321047 77.474292 0.497665 7.623358 10
4 20.000000 20.000000 20.000000 20.000000 20

3.2 索引后直接修改值

df['e'] = 20
df[['a','c']] = 100
print(df)

–> 输出的结果为:

0  100  22.852489  100  31.153962  20
1 100 27.827945 100 74.487188 20
2 100 5.227859 100 71.792757 20
3 100 77.474292 100 7.623358 20
4 100 20.000000 100 20.000000 20

4. 数据删除

4.1 del语句 - 删除列

df = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df)

del df['a']
print(df)

–> 输出的结果为:

0  30.469916  41.632874   5.182408  21.456072
1 22.080842 96.395829 17.761205 54.596288
2 89.695677 32.556029 36.625757 22.049501
3 43.686114 96.212541 7.441507 80.726133

b c d
0 41.632874 5.182408 21.456072
1 96.395829 17.761205 54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133

4.2 drop()删除行

默认参数 ​​inplace=False​​ → 删除后生成新的数据,不改变原数据

print(df.drop(0))
print(df.drop([1,2]))
print(df)

–> 输出的结果为:

1  96.395829  17.761205  54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133

b c d
0 41.632874 5.182408 21.456072
3 96.212541 7.441507 80.726133

b c d
0 41.632874 5.182408 21.456072
1 96.395829 17.761205 54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133

4.3 drop()删除列

需要加上​​axis = 1,inplace=False​​ → 删除后生成新的数据,不改变原数据

print(df.drop(['d'], axis = 1))
print(df)

–> 输出的结果为:

0  41.632874   5.182408
1 96.395829 17.761205
2 32.556029 36.625757
3 96.212541 7.441507

b c d
0 41.632874 5.182408 21.456072
1 96.395829 17.761205 54.596288
2 32.556029 36.625757 22.049501
3 96.212541 7.441507 80.726133

5. 数据对齐

DataFrame对象之间的数据自动按照列和索引(行标签)对齐

df1 = pd.DataFrame(np.random.randn(10, 4), columns=['A', 'B', 'C', 'D'])
df2 = pd.DataFrame(np.random.randn(7, 3), columns=['A', 'B', 'C'])
print(df1 + df2)

–> 输出的结果为:

0  1.951824  0.119430 -1.982792 NaN
1 -1.063710 0.682838 -0.484747 NaN
2 0.543521 3.587741 0.121565 NaN
3 0.501066 1.992348 0.569522 NaN
4 2.074808 -0.544962 0.403096 NaN
5 -0.565621 -0.232803 -0.830447 NaN
6 -1.384398 -0.675027 -0.314824 NaN
7 NaN NaN NaN NaN
8 NaN NaN NaN NaN
9

6. 数据排序

6.1 按值排序 .sort_values

除了对dataframe有效外,对series数据也有效

​ascending = True​​ 升序

​ascending = False​​ 降序

6.1.1 单列排序

df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
columns = ['a','b','c','d'])
print(df1)
print(df1.sort_values(['a'], ascending = True)) # 升序
print(df1.sort_values(['a'], ascending = False)) # 降序

–> 输出的结果为:

0   9.819919  71.007572  22.839585  63.658534
1 10.029993 54.830601 46.236912 19.465751
2 1.837689 98.963422 64.585373 29.611975
3 16.754768 50.427218 14.561929 6.969858

a b c d
2 1.837689 98.963422 64.585373 29.611975
0 9.819919 71.007572 22.839585 63.658534
1 10.029993 54.830601 46.236912 19.465751
3 16.754768 50.427218 14.561929 6.969858

a b c d
3 16.754768 50.427218 14.561929 6.969858
1 10.029993 54.830601 46.236912 19.465751
0 9.819919 71.007572 22.839585 63.658534
2 1.837689 98.963422 64.585373 29.611975

6.1.2 多列排序

df2 = pd.DataFrame({'a':[1,1,1,1,2,2,2,2],
'b':list(range(8)),
'c':list(range(8,0,-1))})
print(df2)
print(df2.sort_values(['a','c']))

–> 输出的结果为:(先按照a进行排序,然后在按照c排序)

0  1  0  8
1 1 1 7
2 1 2 6
3 1 3 5
4 2 4 4
5 2 5 3
6 2 6 2
7 2 7 1

a b c
3 1 3 5
2 1 2 6
1 1 1 7
0 1 0 8
7 2 7 1
6 2 6 2
5 2 5 3
4 2 4 4

6.2 索引排序 .sort_index

除了对dataframe有效外,对series数据也有效

​ascending = True​​ 升序

​ascending = False​​ 降序

df1 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = [5,4,3,2],
columns = ['a','b','c','d'])
df2 = pd.DataFrame(np.random.rand(16).reshape(4,4)*100,
index = ['h','s','x','g'],
columns = ['a','b','c','d'])
print(df1)
print(df1.sort_index())
print(df2)
print(df2.sort_index())

–> 输出的结果为:(不仅对数字有效,对字符也有效)

5  98.079322  81.223109  39.534693  39.763032
4 42.068402 83.658613 14.678341 97.784928
3 10.901214 15.797918 26.516650 37.804133
2 8.326599 2.813564 41.619509 74.280190

a b c d
2 8.326599 2.813564 41.619509 74.280190
3 10.901214 15.797918 26.516650 37.804133
4 42.068402 83.658613 14.678341 97.784928
5 98.079322 81.223109 39.534693 39.763032

a b c d
h 8.519638 39.267385 89.480081 25.455433
s 81.948385 2.519190 6.892622 43.315483
x 16.037407 56.810954 20.749150 19.843433
g 96.832274 77.508434 96.155294 33.028485

a b c d
g 96.832274 77.508434 96.155294 33.028485
h 8.519638 39.267385 89.480081 25.455433
s 81.948385 2.519190 6.892622 43.315483
x 16.037407 56.810954 20.749150 19.843433


举报

相关推荐

0 条评论