pandas库之DataFrame滑动窗口(rolling window)(官网介绍)

(1)DataFrame的滑动窗口

提供滑动窗口计算,可用于时间序列(时间和日期)数据

DataFrame.rolling(window, min_periods=None, center=False, win_type=None, on=None, axis=0, closed=None, method='single')

参数:

  • windowint, offset, or BaseIndexer subclass
    移动窗口的大小,如果是整数,代表每个窗口覆盖的固定数量;如果是offset(pandas时间序列),代表每个窗口的时间段,每个窗口的大小将根据时间段中包含的观察值而变化,仅对datetimelike索引有效。
  • min_periodsint, default None
    窗口计算值要求至少有min_periods个观测值。窗口由时间类型指定,则min_periods默认为1,窗口为整数,则min_periods默认为窗口大小
  • centerbool, default False
    是否将窗口中间索引设为窗口计算后的标签
  • win_typestr, default None
    观测值的权重分布。如果为None,则所有点的权重均相等。如果是字符串,要求是 scipy.signal window function函数
  • onstr, optional
    对于 DataFrame,计算滚动窗口所依照的列标签或索引级别,而不是 DataFrame 的索引
  • axisint or str, default 0
    如果是0或’index’,按行滚动;如果是1或’columns’,按列滚动
  • closedstr, default None
    ‘right’:窗口中的第一个点将从计算中排除;‘left‘:窗口中的最后一个点将从计算中排除;‘both’:窗口中没有点将从计算中排除;‘neither’:窗口中的第一个点和最后一个点将从计算中排除;默认’right’
  • Example

    窗口大小为2的求和

    >>> import pandas as pd
    >>> import numpy as np
    >>> df = pd.DataFrame({'B':[0,1,2,np.nan,4]})
    >>> df
         B
    0  0.0
    1  1.0
    2  2.0
    3  NaN
    4  4.0
    >>> df.rolling(2).sum()
         B
    0  NaN
    1  1.0
    2  3.0
    3  NaN
    4  NaN
    

    窗口为2s的求和

    >>> df_time = pd.DataFrame({'B':[0,1,2,np.nan,4]},
    		       index = [
    		       pd.Timestamp('20130101 09:00:00'),
    			   pd.Timestamp('20130101 09:00:02'),
    			   pd.Timestamp('20130101 09:00:03'), 
    			   pd.Timestamp('20130101 09:00:05'),
    			   pd.Timestamp('20130101 09:00:06')])
    			                                                   
    >>> df_time
                           B
    2013-01-01 09:00:00  0.0
    2013-01-01 09:00:02  1.0
    2013-01-01 09:00:03  2.0
    2013-01-01 09:00:05  NaN
    2013-01-01 09:00:06  4.0
    
    >>> df_time.rolling('2s').sum()
                           B
    2013-01-01 09:00:00  0.0
    2013-01-01 09:00:02  1.0
    2013-01-01 09:00:03  3.0
    2013-01-01 09:00:05  NaN
    2013-01-01 09:00:06  4.0
    

    有 2 个观测值的前视窗口的滚动求和(a和a+1)

    # 设置前向窗口
    >>> indexer = pd.api.indexers.FixedForwardWindowIndexer(window_size=2)
    >>> df = pd.DataFrame({'B': [0, 1, 2, np.nan, 4]})
    >>> df.rolling(window=indexer,min_periods=1).sum()
         B
    0  1.0
    1  3.0
    2  2.0
    3  4.0
    4  4.0
    

    窗口长度为 2 个观测值的滚动和,但至少需要 1 个观测值才可计算值

    >>> df.rolling(2,min_periods=1).sum()
         B
    0  0.0
    1  1.0
    2  3.0
    3  2.0
    4  4.0
    

    滚动总和,并将结果分配到窗口索引的中心

    >>> df.rolling(3, min_periods=1, center=True).sum()
         B
    0  1.0
    1  3.0
    2  3.0
    3  6.0
    4  4.0
    >>> df.rolling(3, min_periods=1, center=False).sum()
         B
    0  0.0
    1  1.0
    2  3.0
    3  3.0
    4  6.0
    

    高斯分布窗口

    >>> df.rolling(2,win_type='gaussian').sum(std=3)
              B
    0       NaN
    1  0.986207
    2  2.958621
    3       NaN
    4       NaN
    

    (2)pandas的窗口操作

    窗口由从当前观测值回溯窗口长度组成

    >>> import pandas as pd
    >>> s = pd.Series(range(5))
    >>> s
    0    0
    1    1
    2    2
    3    3
    4    4
    dtype: int64
    
    # 5个分区
    >>> for window in s.rolling(window=2):
    	print(window)
    
    	
    0    0
    dtype: int64
    0    0
    1    1
    dtype: int64
    1    1
    2    2
    dtype: int64
    2    2
    3    3
    dtype: int64
    3    3
    4    4
    dtype: int64
    

    panadas支持4种窗口操作

    1. Rolling window:值的固定/变动的滑动窗口
    2. Weighted window:由 scipy.signal 库提供的加权非矩形窗口
    3. Expanding window:值的累积窗口
    4. Exponentially Weighted window:值的累积和指数加权窗

      其中滑动窗口支持时间序列的计算
    >>> s = pd.Series(range(5),index = pd.date_range('2020-01-01',periods=5,freq='1D'))
    >>> s
    2020-01-01    0
    2020-01-02    1
    2020-01-03    2
    2020-01-04    3
    2020-01-05    4
    Freq: D, dtype: int64
    >>> s.rolling(window='2D').sum()
    2020-01-01    0.0
    2020-01-02    1.0
    2020-01-03    3.0
    2020-01-04    5.0
    2020-01-05    7.0
    Freq: D, dtype: float64
    

    部分窗口支持先分组再执行窗口操作

    >>> df = pd.DataFrame({'A':['a', 'b', 'a', 'b', 'a'],'B':range(5)})
    >>> df
       A  B
    0  a  0
    1  b  1
    2  a  2
    3  b  3
    4  a  4
    >>> df.groupby('A').expanding().sum()
           B
    A       
    a 0  0.0
      2  2.0
      4  6.0
    b 1  1.0
      3  4.0
    

    Rolling window

    >>> times = ['2020-01-01', '2020-01-03', '2020-01-04', '2020-01-05', '2020-01-29']
    >>> s = pd.Series(range(5),index = pd.DatetimeIndex(times))
    >>> s
    2020-01-01    0
    2020-01-03    1
    2020-01-04    2
    2020-01-05    3
    2020-01-29    4
    dtype: int64
    
    # 两个观测值的窗口
    >>> s.rolling(2).sum()
    2020-01-01    NaN
    2020-01-03    1.0
    2020-01-04    3.0
    2020-01-05    5.0
    2020-01-29    7.0
    dtype: float64
    
    # 两天的窗口
    >>> s.rolling('2D').sum()
    2020-01-01    0.0
    2020-01-03    1.0
    2020-01-04    3.0
    2020-01-05    5.0
    2020-01-29    4.0
    dtype: float64
    

    Centering windows

    窗口计算后默认标签是窗口的最后一个,center可以使中间索引作为标签

    >>> s = pd.Series(range(10))
    >>> s.rolling(window=5).mean()
    0    NaN
    1    NaN
    2    NaN
    3    NaN
    4    2.0
    5    3.0
    6    4.0
    7    5.0
    8    6.0
    9    7.0
    dtype: float64
    >>> s.rolling(window=5, center=True).mean()
    0    NaN
    1    NaN
    2    2.0
    3    3.0
    4    4.0
    5    5.0
    6    6.0
    7    7.0
    8    NaN
    9    NaN
    dtype: float64
    

    Rolling apply

    自定义窗口计算公式

    >>> import numpy as np
    >>> def mad(x):
    	return np.fabs(x - x.mean()).mean()
    
    >>> s = pd.Series(range(10))
    >>> s.rolling(window=4).apply(mad, raw=True)
    0    NaN
    1    NaN
    2    NaN
    3    1.0
    4    1.0
    5    1.0
    6    1.0
    7    1.0
    8    1.0
    9    1.0
    dtype: float64
    

    Weighted window

    为窗口中的值添加权重

    >>> s = pd.Series(range(10))
    >>> s.rolling(window=5, win_type="gaussian").mean(std=0.1)
    0    NaN
    1    NaN
    2    NaN
    3    NaN
    4    2.0
    5    3.0
    6    4.0
    7    5.0
    8    6.0
    9    7.0
    dtype: float64
    

    来源:bujbujbiu

    物联沃分享整理
    物联沃-IOTWORD物联网 » pandas库之DataFrame滑动窗口(rolling window)(官网介绍)

    发表回复