这篇随笔主要介绍如何利用 Pandas 进行层次化索引、合并数据和重塑数据
Pt.1 部分主要介绍利用 Pandas 进行层次化索引
Pt.2 部分详细介绍利用 Pandas 合并数据
Pt.3 部分主要介绍利用 Pandas 合并数据和重塑数据
Pt.4 部分主要介绍利用 Pandas 重塑数据
1 2
| import pandas as pd import numpy as np
|
层次化索引
Series 的层次化索引
1 2 3 4
| data = pd.Series(np.random.randn(6), index=[['a', 'a', 'b', 'b', 'c', 'c'], [1, 2, 1, 3, 2, 3]]) data
|
1 2 3 4 5 6 7
| a 1 0.971509 2 1.042921 b 1 0.316566 3 1.730938 c 2 -0.457496 3 1.183176 dtype: float64
|
1 2 3 4 5 6 7
| MultiIndex([('a', 1), ('a', 2), ('b', 1), ('b', 3), ('c', 2), ('c', 3)], )
|
切片
1 2 3
| 1 0.316566 3 1.730938 dtype: float64
|
1 2 3 4 5
| b 1 0.316566 3 1.730938 c 2 -0.457496 3 1.183176 dtype: float64
|
1 2 3 4 5
| b 1 0.316566 3 1.730938 a 1 0.971509 2 1.042921 dtype: float64
|
在“内层”中进行切片
1 2 3
| a 1.042921 c -0.457496 dtype: float64
|
series.unstack( ), frame.stack( ) : 数据重塑, Series → DataFrame, DataFrame → Series
1 2
| frame = data.unstack() frame
|
| 1 | 2 | 3 |
---|
a | 0.971509 | 1.042921 | NaN |
---|
b | 0.316566 | NaN | 1.730938 |
---|
c | NaN | -0.457496 | 1.183176 |
---|
1 2 3 4 5 6 7
| a 1 0.971509 2 1.042921 b 1 0.316566 3 1.730938 c 2 -0.457496 3 1.183176 dtype: float64
|
DataFrame 的层次化索引
1 2 3 4 5 6
| frame = pd.DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']]) frame
|
| | Ohio | Colorado |
---|
| | Green | Red | Green |
---|
a | 1 | 0 | 1 | 2 |
---|
2 | 3 | 4 | 5 |
---|
b | 1 | 6 | 7 | 8 |
---|
2 | 9 | 10 | 11 |
---|
names 属性
1 2 3
| frame.index.names = ['key1', 'key2'] frame.columns.names = ['state', 'color'] frame
|
| state | Ohio | Colorado |
---|
| color | Green | Red | Green |
---|
key1 | key2 | | | |
---|
a | 1 | 0 | 1 | 2 |
---|
2 | 3 | 4 | 5 |
---|
b | 1 | 6 | 7 | 8 |
---|
2 | 9 | 10 | 11 |
---|
切片方式
1
| frame['Ohio'], frame['Ohio']['Green']
|
1 2 3 4 5 6 7 8 9 10 11 12
| (color Green Red key1 key2 a 1 0 1 2 3 4 b 1 6 7 2 9 10, key1 key2 a 1 0 2 3 b 1 6 2 9 Name: Green, dtype: int32)
|
frame.set_index( keys, drop ), frame.reset_index( level ) : 将 DataFrame 的列转为索引
1 2
| frame2 = frame.reset_index() frame2
|
state | key1 | key2 | Ohio | Colorado |
---|
color | | | Green | Red | Green |
---|
0 | a | 1 | 0 | 1 | 2 |
---|
1 | a | 2 | 3 | 4 | 5 |
---|
2 | b | 1 | 6 | 7 | 8 |
---|
3 | b | 2 | 9 | 10 | 11 |
---|
1
| frame2.set_index(['key1', 'key2'])
|
| state | Ohio | Colorado |
---|
| color | Green | Red | Green |
---|
key1 | key2 | | | |
---|
a | 1 | 0 | 1 | 2 |
---|
2 | 3 | 4 | 5 |
---|
b | 1 | 6 | 7 | 8 |
---|
2 | 9 | 10 | 11 |
---|
1
| frame2.set_index(['key1', 'key2'], drop=False)
|
| state | key1 | key2 | Ohio | Colorado |
---|
| color | | | Green | Red | Green |
---|
key1 | key2 | | | | | |
---|
a | 1 | a | 1 | 0 | 1 | 2 |
---|
2 | a | 2 | 3 | 4 | 5 |
---|
b | 1 | b | 1 | 6 | 7 | 8 |
---|
2 | b | 2 | 9 | 10 | 11 |
---|
frame.swaplevel( level1, level2 ), frame.sort_index( level ) : 重排与分级排序
| state | Ohio | Colorado |
---|
| color | Green | Red | Green |
---|
key1 | key2 | | | |
---|
a | 1 | 0 | 1 | 2 |
---|
2 | 3 | 4 | 5 |
---|
b | 1 | 6 | 7 | 8 |
---|
2 | 9 | 10 | 11 |
---|
1
| frame.sort_index(level=1)
|
| state | Ohio | Colorado |
---|
| color | Green | Red | Green |
---|
key1 | key2 | | | |
---|
a | 1 | 0 | 1 | 2 |
---|
b | 1 | 6 | 7 | 8 |
---|
a | 2 | 3 | 4 | 5 |
---|
b | 2 | 9 | 10 | 11 |
---|
1
| frame.swaplevel('key1', 'key2')
|
| state | Ohio | Colorado |
---|
| color | Green | Red | Green |
---|
key2 | key1 | | | |
---|
1 | a | 0 | 1 | 2 |
---|
2 | a | 3 | 4 | 5 |
---|
1 | b | 6 | 7 | 8 |
---|
2 | b | 9 | 10 | 11 |
---|
1
| frame.swaplevel('key1', 'key2').sort_index(level='key2')
|
| state | Ohio | Colorado |
---|
| color | Green | Red | Green |
---|
key2 | key1 | | | |
---|
1 | a | 0 | 1 | 2 |
---|
b | 6 | 7 | 8 |
---|
2 | a | 3 | 4 | 5 |
---|
b | 9 | 10 | 11 |
---|
frame.groupby( axis, level ).sum( ) : 汇总统计
1
| frame.groupby(level='key2').sum()
|
state | Ohio | Colorado |
---|
color | Green | Red | Green |
---|
key2 | | | |
---|
1 | 6 | 8 | 10 |
---|
2 | 12 | 14 | 16 |
---|
1
| frame.groupby(axis=1, level='color').sum()
|
| color | Green | Red |
---|
key1 | key2 | | |
---|
a | 1 | 2 | 1 |
---|
2 | 8 | 4 |
---|
b | 1 | 14 | 7 |
---|
2 | 20 | 10 |
---|