0
点赞
收藏
分享

微信扫一扫

Pandas的Series和DataFrame之间的运算


Pandas数据类型

数据类型

含义

Series

Pandas库中的一维数组,每个元素有自己的索引

DataFrame

Pandas库中的二维数组,是Serries容器,表格型数据类型,含有一组有序的列,有行索引和列索引

Series基本操作

import pandas as pd

s1 = pd.Series([1,2,3,4])
s2 = pd.Series([5,6,7,8])
print('数组相加\n', s1+s2)
print('数组相乘\n', s1+s2)

s1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([5,6,7,8], index=['b', 'c', 'd', 'a'])
print('数组索引自动对齐\n', s1+s2)

s1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([6,7,8], index=['b', 'c', 'd'])
print('数组索引缺失值NaN\n', s1+s2)

DataFrame基本操作

import pandas as pd

data = {
'name': ['zhangsan', 'lisi', 'wangwu'],
'year': [1990, 1991, 1992]
}
frame = pd.DataFrame(data)
print('创建DataFrame\n', frame)

frame = pd.DataFrame(data, columns=['year', 'name'])
print('指定DataFrame列序列\n', frame)
print('行索引\n', frame.index)
print('列索引\n', frame.columns)
print('数据值\n', frame.values)

DataFrame 数据计算、扩充、重建索引、丢弃、排序

import pandas as pd

df = pd.DataFrame([[10,8,7],[14,7,6]], columns=['col1', 'col2', 'col3'], index=['a', 'b'])

# 数据计算
print('列求和\n', df.sum())
print('行求和\n', df.sum(1))
print('减法\n', df-1)
print('乘法\n', df*2)
print('除法\n', df/2)

# 新增一列
# df['col4'] = [7, 8]
col = pd.DataFrame([7, 8], columns=['col4'], index=['a', 'b'])
df = pd.concat([df, col], axis=1)

# 新增一行
row1 = pd.DataFrame({'col1': 0, 'col2':11, 'col3':12, 'col4':13}, index=['c'])
df = pd.concat([df, row1])

# 重建索引,不存在的索引出现,标识为NaN
df = df.reindex(['a', 'b', 'c', 'd'])

# 数据丢弃
# 丢弃一行
df = df.drop('d')
df = df.drop('col4', axis=1)

# 排序
# 索引倒序
df = df.sort_index(ascending=False)
df.sort_values(by=['col1'])

import pandas as pd

df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['col1', 'col2', 'col3'], index=['a', 'b'])
print(df)
print('最大值索引\n', df.idxmax())
print('最大值索引\n', df.idxmin())
print('累加\n', df.cumsum())

DataFrame和Series之间的运算

import pandas as pd
import numpy as np

frame = pd.DataFrame(np.arange(12).reshape(4,3), columns=list("bde"), index=['one', 'two', 'three', 'four'])
# b d e
# one 0 1 2
# two 3 4 5
# three 6 7 8
# four 9 10 11

frame.iloc[0]
# b 0
# d 1
# e 2
# Name: one, dtype: int64

frame.loc['one']
# b 0
# d 1
# e 2
# Name: one, dtype: int64

series = frame.iloc[0]
frame - series

# b d e
# one 0 0 0
# two 3 3 3
# three 6 6 6
# four 9 9 9


举报

相关推荐

0 条评论