Pandas + Datetime - 时间序列

这篇随笔主要介绍利用 Pandas 和 Datetime 处理时间序列。包括日期和时间数据类型的介绍、时间数据与字符串之间的转换、时间序列的索引、时间序列的平移、时期数据的创建和计算以及时间序列的重采样和频率转换

1
2
3
4
5
import pandas as pd
from datetime import datetime
from datetime import timedelta
from dateutil.parser import parse
import numpy as np

日期和时间数据的类型

datetime 类型

1
2
now = datetime.now()
now
1
datetime.datetime(2021, 10, 9, 21, 38, 2, 642419)
1
now.year, now.month, now.day, now.hour, now.minute, now.second
1
(2021, 10, 9, 21, 38, 2)

timedelta 类型

1
2
delta = datetime(2021, 5, 1, 0, 8, 40) - datetime(2019, 5, 1)
delta
1
datetime.timedelta(days=731, seconds=520)
1
2
start = datetime(2019, 5, 1)
start + timedelta(20, 20)
1
datetime.datetime(2019, 5, 21, 0, 0, 20)

datetime 与 string 的转换

pd.to_datetime( list ) : datetime → string

to_datetime 方法可以解析多种不同的日期表示形式

1
2
datestrs = ['2019/5/1/00:00:00', '2021-5-1 00:00:00']
pd.to_datetime(datestrs)
1
DatetimeIndex(['2019-05-01', '2021-05-01'], dtype='datetime64[ns]', freq=None)

str(datetime), datetime.strftime('%Y-%m-%d') : datetime → string

1
2
now = datetime.now()
now, str(now), now.strftime('%Y-%m-%d')
1
2
3
(datetime.datetime(2021, 10, 9, 21, 38, 2, 880786),
'2021-10-09 21:38:02.880786',
'2021-10-09')

datetime 格式定义

代码说明
%Y, %y4 位数的年, 2 位数的年
%m , %h2 位数的月 [01, 12], 英文缩写表示的月 [Jan, Feb, ..., Dec]
%d2 位数的日 [01, 31]
%H , %l24 小时制的时 [00, 23], 12 小时制的时 [01, 12]
%M, %S2 位数的分 [00, 59], 2 位数的秒 [00, 59]
%w整数表示的星期几 [0, 6]
%W2 位数的每年的第几周 [00, 53], 以星期一为每周的开始
%U2 位数的每年的第几周 [00, 53], 以星期日为每周的开始
%F%Y-%m-%d
%D%m/%d/%y

datetime.strptime(string, '%Y-%m-%d') : string → datetime

1
2
value = '2019-05-01'
datetime.strptime(value, '%Y-%m-%d')
1
datetime.datetime(2019, 5, 1, 0, 0)
1
2
datestrs = ['7/25/1998', '11/24/1997']
[datetime.strptime(x, '%m/%d/%Y') for x in datestrs]
1
[datetime.datetime(1998, 7, 25, 0, 0), datetime.datetime(1997, 11, 24, 0, 0)]

dateutil.parser.parse( string, dayfirst ) : string → datetime

1
parse('2019-05-01')
1
datetime.datetime(2019, 5, 1, 0, 0)
1
parse('May 1, 2019 10:45 PM')
1
datetime.datetime(2019, 5, 1, 22, 45)

在国际通用的格式中, 日出现在月的前面很普遍, 传入 dayfirst=True 即可解决这个问题

1
parse('1/5/2019'), parse('1/5/2019', dayfirst=True)
1
(datetime.datetime(2019, 1, 5, 0, 0), datetime.datetime(2019, 5, 1, 0, 0))

时间序列的索引

1
2
3
4
5
dates = [datetime(2011, 1, 2), datetime(2011, 1, 5),
datetime(2011, 1, 7), datetime(2011, 1, 8),
datetime(2011, 1, 10), datetime(2011, 1, 12)]
ts = pd.Series(np.random.randn(6), index=dates)
ts
1
2
3
4
5
6
7
2011-01-02   -1.248703
2011-01-05 -0.072871
2011-01-07 1.672176
2011-01-08 -0.761810
2011-01-10 -0.044322
2011-01-12 -0.701274
dtype: float64

可以传入一个可以被解释为日期的字符串进行切片

1
2
3
ts = pd.Series(np.arange(1000),
index=pd.date_range('1/1/2000', periods=1000))
ts.head()
1
2
3
4
5
6
2000-01-01    0
2000-01-02 1
2000-01-03 2
2000-01-04 3
2000-01-05 4
Freq: D, dtype: int32
1
ts['2000-1-1':'2000-1-3']
1
2
3
4
2000-01-01    0
2000-01-02 1
2000-01-03 2
Freq: D, dtype: int32

对于较长的时间序列, 只需传入“年”或“年月”即可轻松选取数据的切片

1
ts['2001'].head()
1
2
3
4
5
6
2001-01-01    366
2001-01-02 367
2001-01-03 368
2001-01-04 369
2001-01-05 370
Freq: D, dtype: int32
1
ts['2001-2'].head()
1
2
3
4
5
6
2001-02-01    397
2001-02-02 398
2001-02-03 399
2001-02-04 400
2001-02-05 401
Freq: D, dtype: int32

传入 datetime 对象也可以进行切片

1
ts[datetime(2001,1,7):datetime(2001,1,8)]
1
2
3
2001-01-07    372
2001-01-08 373
Freq: D, dtype: int32

ts.truncate( before, after )

before : 向后切片

1
ts.truncate(before='2001-9').head()
1
2
3
4
5
6
2001-09-01    609
2001-09-02 610
2001-09-03 611
2001-09-04 612
2001-09-05 613
Freq: D, dtype: int32

after : 向前切片

1
ts.truncate(after='2000-2-4').head()
1
2
3
4
5
6
2000-01-01    0
2000-01-02 1
2000-01-03 2
2000-01-04 3
2000-01-05 4
Freq: D, dtype: int32
1
ts.truncate(before = '2000-1-1', after='2000-1-4')
1
2
3
4
5
2000-01-01    0
2000-01-02 1
2000-01-03 2
2000-01-04 3
Freq: D, dtype: int32

时间的范围、频率和位移

pd.date_range( start, end, periods, freq )

1
pd.date_range('2019-05-01', '2021-05-01')
1
2
3
4
5
6
7
8
DatetimeIndex(['2019-05-01', '2019-05-02', '2019-05-03', '2019-05-04',
'2019-05-05', '2019-05-06', '2019-05-07', '2019-05-08',
'2019-05-09', '2019-05-10',
...
'2021-04-22', '2021-04-23', '2021-04-24', '2021-04-25',
'2021-04-26', '2021-04-27', '2021-04-28', '2021-04-29',
'2021-04-30', '2021-05-01'],
dtype='datetime64[ns]', length=732, freq='D')
1
pd.date_range('2019-05-01', periods=3)
1
DatetimeIndex(['2019-05-01', '2019-05-02', '2019-05-03'], dtype='datetime64[ns]', freq='D')

freq = 'BM' : 'business end of month' 每月最后一个工作日

时间的频率表示说明
H, T/min, S, L/ms, U每小时, 每分钟, 每秒, 每毫秒, 每微秒
D, M, MS每日历日, 每月的最后一个日历日, 每月的第一个日历日
B, BM, BMS每工作日, 每月的最后一个工作日, 每月的第一个工作日
W-MON, W-TUE, ..., W-SUN每周, 从指定的星期几开始算起
WOM-1MON, WOM-2MON, ..., WOM-4SUN每月, 从指定的第几周的星期几开始算起
Q-JAN, Q-FEB, ..., Q-DEC每季, 从指定的月份的最后一个日历日开始算起
QS-JAN, QS-FEB, ..., QS-DEC每季, 从指定的月份的第一个日历日开始算起
BQ-JAN, BQ-FEB, ..., BQ-DEC每季, 从指定的月份的最后一个工作日开始算起
BQS-JAN, BQS-FEB, ..., BQS-DEC每季, 从指定的月份的第一个工作日开始算起
A-JAN, A-FEB, ..., A-DEC每年, 从指定的月份的最后一个日历日开始算起
AS-JAN, AS-FEB, ..., AS-DEC每年, 从指定的月份的第一个日历日开始算起
BA-JAN, BA-FEB, ..., BA-DEC每年, 从指定的月份的最后一个工作日开始算起
BAS-JAN, BAS-FEB, ..., BAS-DEC每年, 从指定的月份的第一个工作日开始算起
1
pd.date_range('2020-01-01', '2021-01-01', freq='M')
1
2
3
4
DatetimeIndex(['2020-01-31', '2020-02-29', '2020-03-31', '2020-04-30',
'2020-05-31', '2020-06-30', '2020-07-31', '2020-08-31',
'2020-09-30', '2020-10-31', '2020-11-30', '2020-12-31'],
dtype='datetime64[ns]', freq='M')
1
pd.date_range('2020-01-01', '2021-01-01', freq='4M')

DatetimeIndex(['2020-01-31', '2020-05-31', '2020-09-30'], dtype='datetime64[ns]', freq='4M')

1
pd.date_range('2020-01-01', '2020-01-02', freq='2h30min')
1
2
3
4
5
6
DatetimeIndex(['2020-01-01 00:00:00', '2020-01-01 02:30:00',
'2020-01-01 05:00:00', '2020-01-01 07:30:00',
'2020-01-01 10:00:00', '2020-01-01 12:30:00',
'2020-01-01 15:00:00', '2020-01-01 17:30:00',
'2020-01-01 20:00:00', '2020-01-01 22:30:00'],
dtype='datetime64[ns]', freq='150T')

ts.shift( periods, freq ) : 时间序列的平移

1
2
3
ts = pd.Series(np.arange(1, 5),
index=pd.date_range('1/1/2000', periods=4, freq='D'))
ts
1
2
3
4
5
2000-01-01    1
2000-01-02 2
2000-01-03 3
2000-01-04 4
Freq: D, dtype: int32

Series 和 DataFrame 都有一个 shift 方法用于执行单纯的前移或后移操作, 保持索引不变

1
ts.shift(-1), ts / ts.shift(1) -1 #计算随时间的变化
1
2
3
4
5
6
7
8
9
10
(2000-01-01    2.0
2000-01-02 3.0
2000-01-03 4.0
2000-01-04 NaN
Freq: D, dtype: float64,
2000-01-01 NaN
2000-01-02 1.000000
2000-01-03 0.500000
2000-01-04 0.333333
Freq: D, dtype: float64)

ts.shift(-1) vs ts.shfit(-1, freq='D')

1
ts.shift(-1, freq='D') # 时间序列向后平移3个 'D'
1
2
3
4
5
1999-12-31    1
2000-01-01 2
2000-01-02 3
2000-01-03 4
Freq: D, dtype: int32
1
ts.shift(1, freq='3D') # 时间序列向后平移3个 'D'
1
2
3
4
5
2000-01-04    1
2000-01-05 2
2000-01-06 3
2000-01-07 4
Freq: D, dtype: int32
1
ts.shift(1, freq='90T') # 时间序列向后平移90个 'min'
1
2
3
4
5
2000-01-01 01:30:00    1
2000-01-02 01:30:00 2
2000-01-03 01:30:00 3
2000-01-04 01:30:00 4
Freq: D, dtype: int32

Hour(), Minute(), Day(), MonthEnd() : 通过偏移量对日期进行平移

Hour(), Minute(), Day(), MonthEnd(), ... : 日期偏移量( date offset )对象

1
2
3
from pandas.tseries.offsets import Day, MonthEnd
now = datetime.now()
now
1
datetime.datetime(2021, 10, 9, 21, 38, 3, 837907)
1
now + 3 * Day(), now + Day(3)
1
2
(Timestamp('2021-10-12 21:38:03.837907'),
Timestamp('2021-10-12 21:38:03.837907'))
1
now + 1 * MonthEnd()
1
Timestamp('2021-10-31 21:38:03.837907')

offset.rollforward( datetime ), offset.rollback( datetime ) : 控制将日期向前或向后“滚动”

1
2
offset = MonthEnd()
offset.rollforward(now), offset.rollback(now)
1
2
(Timestamp('2021-10-31 21:38:03.837907'),
Timestamp('2021-09-30 21:38:03.837907'))
1
2
ts = pd.Series(np.ones(20), index=pd.date_range('1/15/2000', periods=20, freq='4d'))
ts
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2000-01-15    1.0
2000-01-19 1.0
2000-01-23 1.0
2000-01-27 1.0
2000-01-31 1.0
2000-02-04 1.0
2000-02-08 1.0
2000-02-12 1.0
2000-02-16 1.0
2000-02-20 1.0
2000-02-24 1.0
2000-02-28 1.0
2000-03-03 1.0
2000-03-07 1.0
2000-03-11 1.0
2000-03-15 1.0
2000-03-19 1.0
2000-03-23 1.0
2000-03-27 1.0
2000-03-31 1.0
Freq: 4D, dtype: float64

ts.groupby( offset.rollforward )

1
2
offset = MonthEnd()
ts.groupby(MonthEnd().rollforward).sum()
1
2
3
4
2000-01-31    5.0
2000-02-29 7.0
2000-03-31 8.0
dtype: float64

ts.resample( freq )

1
ts.resample('M').mean()
1
2
3
4
2000-01-31    1.0
2000-02-29 1.0
2000-03-31 1.0
Freq: M, dtype: float64

时期 (period) 及其计算

pd.Period( value, freq ) : 时期的创建

1
2
p = pd.Period(2021, freq='A-MAY')
p
1
Period('2021', 'A-MAY')
1
p + 3
1
Period('2024', 'A-MAY')
1
p - pd.Period(2019, freq='A-MAY')
1
<2 * YearEnds: month=5>

pd.period_range( start, end, freq ) : 时期范围的创建

1
2
rng = pd.period_range('2000-01-01', '2000-06-30', freq='M')
rng
1
PeriodIndex(['2000-01', '2000-02', '2000-03', '2000-04', '2000-05', '2000-06'], dtype='period[M]')
1
pd.Series(np.arange(6), index=rng)
1
2
3
4
5
6
7
2000-01    0
2000-02 1
2000-03 2
2000-04 3
2000-05 4
2000-06 5
Freq: M, dtype: int32

pd.PeriodIndex( values, freq, year, quarter ) : 季度时期范围的创建

1
2
values = ['2001Q3', '2002Q2', '2003Q1']
pd.PeriodIndex(values, freq='Q-DEC')
1
PeriodIndex(['2001Q3', '2002Q2', '2003Q1'], dtype='period[Q-DEC]')

通过数组创建 PeriodIndex

1
2
data = pd.read_csv('pydata-book-2nd-edition/examples/macrodata.csv')
data.year, data.quarter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
(0      1959.0
1 1959.0
2 1959.0
3 1959.0
4 1960.0
...
198 2008.0
199 2008.0
200 2009.0
201 2009.0
202 2009.0
Name: year, Length: 203, dtype: float64,
0 1.0
1 2.0
2 3.0
3 4.0
4 1.0
...
198 3.0
199 4.0
200 1.0
201 2.0
202 3.0
Name: quarter, Length: 203, dtype: float64)
1
2
3
index = pd.PeriodIndex(year=data.year, quarter=data.quarter,
freq='Q-DEC')
index
1
2
3
4
5
6
PeriodIndex(['1959Q1', '1959Q2', '1959Q3', '1959Q4', '1960Q1', '1960Q2',
'1960Q3', '1960Q4', '1961Q1', '1961Q2',
...
'2007Q2', '2007Q3', '2007Q4', '2008Q1', '2008Q2', '2008Q3',
'2008Q4', '2009Q1', '2009Q2', '2009Q3'],
dtype='period[Q-DEC]', length=203)
1
2
data.index = index
data
yearquarterrealgdprealconsrealinvrealgovtrealdpicpim1tbilrateunemppopinflrealint
1959Q11959.01.02710.3491707.4286.898470.0451886.928.980139.72.825.8177.1460.000.00
1959Q21959.02.02778.8011733.7310.859481.3011919.729.150141.73.085.1177.8302.340.74
1959Q31959.03.02775.4881751.8289.226491.2601916.429.350140.53.825.3178.6572.741.09
1959Q41959.04.02785.2041753.7299.356484.0521931.329.370140.04.335.6179.3860.274.06
1960Q11960.01.02847.6991770.5331.722462.1991955.529.540139.63.505.2180.0072.311.19
.............................................
2008Q32008.03.013324.6009267.71990.693991.5519838.3216.8891474.71.176.0305.270-3.164.33
2008Q42008.04.013141.9209195.31857.6611007.2739920.4212.1741576.50.126.9305.952-8.798.91
2009Q12009.01.012925.4109209.21558.494996.2879926.4212.6711592.80.228.1306.5470.94-0.71
2009Q22009.02.012901.5049189.01456.6781023.52810077.5214.4691653.60.189.2307.2263.37-3.19
2009Q32009.03.012990.3419256.01486.3981044.08810040.6216.3851673.90.129.6308.0133.56-3.44

203 rows × 14 columns

period.asfreq( freq, how ) : 时期的频率转换

年 → 月

1
2
p = pd.Period('2021', freq='A-MAY')
p, p.asfreq('M', how='start'), p.asfreq('M', how='end') # 2020.6 开始 # 2021.5 结束
1
(Period('2021', 'A-MAY'), Period('2020-06', 'M'), Period('2021-05', 'M'))

月 → 年

1
2
p = pd.Period('May-2021', 'M')
p, p.asfreq('A-DEC')
1
(Period('2021-05', 'M'), Period('2021', 'A-DEC'))

年 → 日

1
2
3
rng = pd.period_range('2006', '2009', freq='A-DEC')
ts = pd.Series(np.random.randn(len(rng)), index=rng)
ts, ts.asfreq('M', how='start'), ts.asfreq('D', how='start')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
(2006   -0.954789
2007 2.045184
2008 -0.367045
2009 -0.672529
Freq: A-DEC, dtype: float64,
2006-01 -0.954789
2007-01 2.045184
2008-01 -0.367045
2009-01 -0.672529
Freq: M, dtype: float64,
2006-01-01 -0.954789
2007-01-01 2.045184
2008-01-01 -0.367045
2009-01-01 -0.672529
Freq: D, dtype: float64)

按季度计算的时期

1
2
p = pd.Period('2021Q1', freq='Q-DEC') # 季度以DEC为一年的分界, 2021第一季度为2021.1-2021.3
p, p.asfreq('D', 'start'), p.asfreq('D', 'end')
1
2
3
(Period('2021Q1', 'Q-DEC'),
Period('2021-01-01', 'D'),
Period('2021-03-31', 'D'))
1
2
p = pd.Period('2021Q1', freq='Q-JUN') # 季度以JUN为一年的分界, 2021第一季度为2020.7-2020.9
p, p.asfreq('D', 'start'), p.asfreq('D', 'end')
1
2
3
(Period('2021Q1', 'Q-JUN'),
Period('2020-07-01', 'D'),
Period('2020-09-30', 'D'))
1
2
p = pd.Period('2021Q4', freq='Q-JUN') # 季度以JUN为一年的分界, 2021第四季度为2021.4-2021.6
p, p.asfreq('D', 'start'), p.asfreq('D', 'end')
1
2
3
(Period('2021Q4', 'Q-JUN'),
Period('2021-04-01', 'D'),
Period('2021-06-30', 'D'))

获取该季度倒数第二个工作日下午4点的时间戳 :

1
2
3
from pandas.tseries.offsets import Hour
p4pm = (p.asfreq('B', 'end') - 1).asfreq('T', 'start') + Hour(16)
p4pm, p4pm.to_timestamp()
1
(Period('2021-06-29 16:00', 'T'), Timestamp('2021-06-29 16:00:00'))

季度型范围的算术运算

1
2
3
rng = pd.period_range('2019Q3', '2021Q4', freq='Q-DEC')
ts = pd.Series(np.arange(len(rng)), index=rng)
ts
1
2
3
4
5
6
7
8
9
10
11
2019Q3    0
2019Q4 1
2020Q1 2
2020Q2 3
2020Q3 4
2020Q4 5
2021Q1 6
2021Q2 7
2021Q3 8
2021Q4 9
Freq: Q-DEC, dtype: int32
1
2
3
new_rng = rng.asfreq('D', 'end').asfreq('T', 'start') + 16 * 60
ts.index = new_rng.to_timestamp()
ts
1
2
3
4
5
6
7
8
9
10
11
2019-09-30 16:00:00    0
2019-12-31 16:00:00 1
2020-03-31 16:00:00 2
2020-06-30 16:00:00 3
2020-09-30 16:00:00 4
2020-12-31 16:00:00 5
2021-03-31 16:00:00 6
2021-06-30 16:00:00 7
2021-09-30 16:00:00 8
2021-12-31 16:00:00 9
Freq: Q-DEC, dtype: int32

ts.to_period( freq, copy ) : 时间戳 → 时期

1
2
3
rng = pd.date_range('2021-01-01', periods=3, freq='M')
ts = pd.Series(np.random.randn(3), index=rng)
ts, ts.to_period()
1
2
3
4
5
6
7
8
(2021-01-31    0.799601
2021-02-28 -0.747543
2021-03-31 -0.333042
Freq: M, dtype: float64,
2021-01 0.799601
2021-02 -0.747543
2021-03 -0.333042
Freq: M, dtype: float64)
1
2
3
rng = pd.date_range('1/29/2021', periods=6, freq='D')
ts2 = pd.Series(np.random.randn(6), index=rng)
ts2, ts2.to_period('M')
1
2
3
4
5
6
7
8
9
10
11
12
13
14
(2021-01-29   -0.917420
2021-01-30 -1.667312
2021-01-31 0.370422
2021-02-01 -1.841920
2021-02-02 0.067595
2021-02-03 0.539212
Freq: D, dtype: float64,
2021-01 -0.917420
2021-01 -1.667312
2021-01 0.370422
2021-02 -1.841920
2021-02 0.067595
2021-02 0.539212
Freq: M, dtype: float64)

ts.to_timestamp( freq, how, copy ) : 时期 → 时间戳

1
2
3
rng = pd.period_range('1/29/2021', periods=2, freq='M')
ts3 = pd.Series(np.random.randn(2), index=rng)
ts3, ts3.to_timestamp()
1
2
3
4
5
6
(2021-01   -2.029517
2021-02 -0.154121
Freq: M, dtype: float64,
2021-01-01 -2.029517
2021-02-01 -0.154121
dtype: float64)

重采样和频率转换

ts.resample( freq, axis, closed, label, kind, fill_method, limit, convention )

1
2
3
rng = pd.date_range('2000-01-01', periods=100, freq='D')
ts = pd.Series(np.ones(len(rng)), index=rng)
ts
1
2
3
4
5
6
7
8
9
10
11
12
2000-01-01    1.0
2000-01-02 1.0
2000-01-03 1.0
2000-01-04 1.0
2000-01-05 1.0
...
2000-04-05 1.0
2000-04-06 1.0
2000-04-07 1.0
2000-04-08 1.0
2000-04-09 1.0
Freq: D, Length: 100, dtype: float64
1
ts.resample('M').sum()
1
2
3
4
5
2000-01-31    31.0
2000-02-29 29.0
2000-03-31 31.0
2000-04-30 9.0
Freq: M, dtype: float64

kind : 聚合到周期 ( 'period' ) 或 时间戳 ( 'timestamp' )

1
ts.resample('M', kind='period').sum()
1
2
3
4
5
2000-01    31.0
2000-02 29.0
2000-03 31.0
2000-04 9.0
Freq: M, dtype: float64

closed : 降采样中, 设置时间闭合的一端, 'right' (start, end] 或 'left' [start, end)

1
2
3
rng2 = pd.date_range('2000-01-01', periods=7, freq='T')
ts2 = pd.Series(np.arange(7), index=rng2)
ts2
1
2
3
4
5
6
7
8
2000-01-01 00:00:00    0
2000-01-01 00:01:00 1
2000-01-01 00:02:00 2
2000-01-01 00:03:00 3
2000-01-01 00:04:00 4
2000-01-01 00:05:00 5
2000-01-01 00:06:00 6
Freq: T, dtype: int32
1
ts2.resample('5T', closed='right').sum() # ( , ]
1
2
3
4
1999-12-31 23:55:00     0
2000-01-01 00:00:00 15
2000-01-01 00:05:00 6
Freq: 5T, dtype: int32
1
ts2.resample('5T', closed='left').sum() # [ , )
1
2
3
2000-01-01 00:00:00    10
2000-01-01 00:05:00 11
Freq: 5T, dtype: int32

label : 降采样中, 设置聚合值的标签, 'right' 或 'left'

1
ts2.resample('5T', closed='right', label='right').sum()
1
2
3
4
2000-01-01 00:00:00     0
2000-01-01 00:05:00 15
2000-01-01 00:10:00 6
Freq: 5T, dtype: int32

convention : 升采样中, 设置低频周期对应的高频时间戳, 'start' 或 'end'

1
2
3
4
5
frame = pd.DataFrame(np.random.randn(2, 4),
index=pd.date_range('1/1/2000', periods=2,
freq='M'),
columns=['Colorado', 'Texas', 'New York', 'Ohio'])
frame
ColoradoTexasNew YorkOhio
2000-01-311.025930-0.9742270.8427221.017311
2000-02-291.171868-1.525096-0.1596920.148648
1
2
df_month = frame.resample('M', kind='period').sum()
df_month
ColoradoTexasNew YorkOhio
2000-011.025930-0.9742270.8427221.017311
2000-021.171868-1.525096-0.1596920.148648
1
2
df_daily = df_month.resample('D', convention='start').asfreq()
df_daily.head()
ColoradoTexasNew YorkOhio
2000-01-011.02593-0.9742270.8427221.017311
2000-01-02NaNNaNNaNNaN
2000-01-03NaNNaNNaNNaN
2000-01-04NaNNaNNaNNaN
2000-01-05NaNNaNNaNNaN