0
点赞
收藏
分享

微信扫一扫

PyPackage01---Pandas08_合并list中的dataframe

艾米吖 2022-08-04 阅读 37


Intro

  需求很明确,有一个list,每个元素都是一个dataframe,其中dataframe的列数相同。希望把这些子数据框合并成大的数据框。这个list是多线程计算返回的结果,在R里可以直接用do.call函数,那么python中怎么用呢?先看版本信息:

  • 系统:in10
  • Python:3.7.0(python --version)
  • Pandas:0.23.4

数据构造

import pandas as pd  
# sample dataframes
d1 = pd.DataFrame({'one' : [1., 2., 3., 4.], 'two' : [4., 3., 2., 1.]})
d2 = pd.DataFrame({'one' : [5., 6., 7., 8.], 'two' : [9., 10., 11., 12.]})
d3 = pd.DataFrame({'one' : [15., 16., 17., 18.], 'three' : [19., 10., 11., 12.]})

# list of dataframes
mydfs = [d1, d2, d3]

mydfs[0]



one

two

0

1.0

4.0

1

2.0

3.0

2

3.0

2.0

3

4.0

1.0

concat函数

  这个函数其实很常用,只是不知道可以这样用。。。

pd.concat(mydfs)

D:\code\anaconda\lib\site-packages\ipykernel_launcher.py:1: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version
of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

"""Entry point for launching an IPython kernel.



one

three

two

0

1.0

NaN

4.0

1

2.0

NaN

3.0

2

3.0

NaN

2.0

3

4.0

NaN

1.0

0

5.0

NaN

9.0

1

6.0

NaN

10.0

2

7.0

NaN

11.0

3

8.0

NaN

12.0

0

15.0

19.0

NaN

1

16.0

10.0

NaN

2

17.0

11.0

NaN

3

18.0

12.0

NaN

可以看到列名需要一直,不然会根据列名,做容错处理~

reduce函数

from functools import reduce

reduce(lambda df1, df2: df1.merge(df2, "outer"), mydfs)



one

two

three

0

1.0

4.0

NaN

1

2.0

3.0

NaN

2

3.0

2.0

NaN

3

4.0

1.0

NaN

4

5.0

9.0

NaN

5

6.0

10.0

NaN

6

7.0

11.0

NaN

7

8.0

12.0

NaN

8

15.0

NaN

19.0

9

16.0

NaN

10.0

10

17.0

NaN

11.0

11

18.0

NaN

12.0

这个reduce函数和scala里的reduce差不多哎~看来不同语言,在某些功能的实现上是共通的

Ref

​​[1] https://stackoverflow.com/questions/32444138/concatenate-a-list-of-pandas-dataframes-together​​

                                2020-05-07 于南京市江宁区九龙湖


举报

相关推荐

0 条评论