Pandas数据类型
数据类型 | 含义 |
Series | Pandas库中的一维数组,每个元素有自己的索引 |
DataFrame | Pandas库中的二维数组,是Serries容器,表格型数据类型,含有一组有序的列,有行索引和列索引 |
Series基本操作
import pandas as pd
s1 = pd.Series([1,2,3,4])
s2 = pd.Series([5,6,7,8])
print('数组相加\n', s1+s2)
print('数组相乘\n', s1+s2)
s1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([5,6,7,8], index=['b', 'c', 'd', 'a'])
print('数组索引自动对齐\n', s1+s2)
s1 = pd.Series([1,2,3,4], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([6,7,8], index=['b', 'c', 'd'])
print('数组索引缺失值NaN\n', s1+s2)
DataFrame基本操作
import pandas as pd
data = {
'name': ['zhangsan', 'lisi', 'wangwu'],
'year': [1990, 1991, 1992]
}
frame = pd.DataFrame(data)
print('创建DataFrame\n', frame)
frame = pd.DataFrame(data, columns=['year', 'name'])
print('指定DataFrame列序列\n', frame)
print('行索引\n', frame.index)
print('列索引\n', frame.columns)
print('数据值\n', frame.values)
DataFrame 数据计算、扩充、重建索引、丢弃、排序
import pandas as pd
df = pd.DataFrame([[10,8,7],[14,7,6]], columns=['col1', 'col2', 'col3'], index=['a', 'b'])
# 数据计算
print('列求和\n', df.sum())
print('行求和\n', df.sum(1))
print('减法\n', df-1)
print('乘法\n', df*2)
print('除法\n', df/2)
# 新增一列
# df['col4'] = [7, 8]
col = pd.DataFrame([7, 8], columns=['col4'], index=['a', 'b'])
df = pd.concat([df, col], axis=1)
# 新增一行
row1 = pd.DataFrame({'col1': 0, 'col2':11, 'col3':12, 'col4':13}, index=['c'])
df = pd.concat([df, row1])
# 重建索引,不存在的索引出现,标识为NaN
df = df.reindex(['a', 'b', 'c', 'd'])
# 数据丢弃
# 丢弃一行
df = df.drop('d')
df = df.drop('col4', axis=1)
# 排序
# 索引倒序
df = df.sort_index(ascending=False)
df.sort_values(by=['col1'])
import pandas as pd
df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['col1', 'col2', 'col3'], index=['a', 'b'])
print(df)
print('最大值索引\n', df.idxmax())
print('最大值索引\n', df.idxmin())
print('累加\n', df.cumsum())
DataFrame和Series之间的运算
import pandas as pd
import numpy as np
frame = pd.DataFrame(np.arange(12).reshape(4,3), columns=list("bde"), index=['one', 'two', 'three', 'four'])
# b d e
# one 0 1 2
# two 3 4 5
# three 6 7 8
# four 9 10 11
frame.iloc[0]
# b 0
# d 1
# e 2
# Name: one, dtype: int64
frame.loc['one']
# b 0
# d 1
# e 2
# Name: one, dtype: int64
series = frame.iloc[0]
frame - series
# b d e
# one 0 0 0
# two 3 3 3
# three 6 6 6
# four 9 9 9