0
点赞
收藏
分享

微信扫一扫

Python数据分析入门(十一):数据合并


Python爬虫、数据分析、网站开发等案例教程视频免费在线观看

https://space.bilibili.com/523606542

​​Python学习交流群:1039649593​​

数据合并(pd.merge)

  • 根据单个或多个键将不同DataFrame的行连接起来
  • 类似数据库的连接操作
  • pd.merge:(left, right, how='inner',on=None,left_on=None, right_on=None )

    left:合并时左边的DataFrame

    right:合并时右边的DataFrame

    how:合并的方式,默认'inner', 'outer', 'left', 'right'

    on:需要合并的列名,必须两边都有的列名,并以 left 和 right 中的列名的交集作为连接键

    left_on: left Dataframe中用作连接键的列

    right_on: right Dataframe中用作连接键的列

  • 内连接 inner:对两张表都有的键的交集进行联合

Python数据分析入门(十一):数据合并_示例代码

  • 全连接 outer:对两者表的都有的键的并集进行联合
    Python数据分析入门(十一):数据合并_d3_02
  • 左连接 left:对所有左表的键进行联合
    Python数据分析入门(十一):数据合并_d3_03
  • 右连接 right:对所有右表的键进行联合
    Python数据分析入门(十一):数据合并_外连接_04
    示例代码:
import pandas as pd
import numpy as np

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left,right,on='key') #指定连接键key

运行结果:

key    A    B    C    D
0 K0 A0 B0 C0 D0
1 K1 A1 B1 C1 D1
2 K2 A2 B2 C2 D2
3 K3 A3 B3 C3 D3

Python数据分析入门(十一):数据合并_示例代码_05


示例代码:

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left,right,on=['key1','key2']) #指定多个键,进行合并

运行结果:

key1    key2    A    B    C    D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2

Python数据分析入门(十一):数据合并_示例代码_06

#指定左连接

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left, right, how='left', on=['key1', 'key2'])
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN

Python数据分析入门(十一):数据合并_python_07

#指定右连接

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left, right, how='right', on=['key1', 'key2'])
key1 key2 A B C D
0 K0 K0 A0 B0 C0 D0
1 K1 K0 A2 B2 C1 D1
2 K1 K0 A2 B2 C2 D2
3 K2 K0 NaN NaN C3 D3

Python数据分析入门(十一):数据合并_python_08


默认是“内连接”(inner),即结果中的键是交集

how指定连接方式

“外连接”(outer),结果中的键是并集

示例代码:

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left,right,how='outer',on=['key1','key2'])

运行结果:

key1    key2    A    B    C    D
0 K0 K0 A0 B0 C0 D0
1 K0 K1 A1 B1 NaN NaN
2 K1 K0 A2 B2 C1 D1
3 K1 K0 A2 B2 C2 D2
4 K2 K1 A3 B3 NaN NaN
5 K2 K0 NaN NaN C3 D3

Python数据分析入门(十一):数据合并_d3_09

处理重复列名

参数suffixes:默认为_x, _y

示例代码:

# 处理重复列名
df_obj1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'data' : np.random.randint(0,10,7)})
df_obj2 = pd.DataFrame({'key': ['a', 'b', 'd'],
'data' : np.random.randint(0,10,3)})

print(pd.merge(df_obj1, df_obj2, on='key', suffixes=('_left', '_right')))

运行结果:

data_left key  data_right
0 9 b 1
1 5 b 1
2 1 b 1
3 2 a 8
4 2 a 8
5 5 a 8

按索引连接

参数left_index=True或right_index=True

示例代码:

# 按索引连接
df_obj1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'data1' : np.random.randint(0,10,7)})
df_obj2 = pd.DataFrame({'data2' : np.random.randint(0,10,3)}, index=['a', 'b', 'd'])

print(pd.merge(df_obj1, df_obj2, left_on='key', right_index=True))

运行结果:

data1 key  data2
0 3 b 6
1 4 b 6
6 8 b 6
2 6 a 0
4 3 a 0
5 0 a 0

数据合并(pd.concat)

沿轴方向将多个对象合并到一起

1. NumPy的concat

np.concatenate

示例代码:

import numpy as np
import pandas as pd

arr1 = np.random.randint(0, 10, (3, 4))
arr2 = np.random.randint(0, 10, (3, 4))

print(arr1)
print(arr2)

print(np.concatenate([arr1, arr2]))
print(np.concatenate([arr1, arr2], axis=1))

运行结果:

# print(arr1)
[[3 3 0 8]
[2 0 3 1]
[4 8 8 2]]

# print(arr2)
[[6 8 7 3]
[1 6 8 7]
[1 4 7 1]]

# print(np.concatenate([arr1, arr2]))
[[3 3 0 8]
[2 0 3 1]
[4 8 8 2]
[6 8 7 3]
[1 6 8 7]
[1 4 7 1]]

# print(np.concatenate([arr1, arr2], axis=1))
[[3 3 0 8 6 8 7 3]
[2 0 3 1 1 6 8 7]
[4 8 8 2 1 4 7 1]]

2. pd.concat

  • 注意指定轴方向,默认axis=0
  • join指定合并方式,默认为outer
  • Series合并时查看行索引有无重复
df1 = pd.DataFrame(np.arange(6).reshape(3,2),index=list('abc'),columns=['one','two'])

df2 = pd.DataFrame(np.arange(4).reshape(2,2)+5,index=list('ac'),columns=['three','four'])

pd.concat([df1,df2]) #默认外连接,axis=0
four one three two
a NaN 0.0 NaN 1.0
b NaN 2.0 NaN 3.0
c NaN 4.0 NaN 5.0
a 6.0 NaN 5.0 NaN
c 8.0 NaN 7.0 NaN

pd.concat([df1,df2],axis='columns') #指定axis=1连接
one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0

#同样我们也可以指定连接的方式为inner
pd.concat([df1,df2],axis=1,join='inner')

one two three four
a 0 1 5 6
c 4 5 7 8


举报

相关推荐

0 条评论