• 不规则间隔时间序列转规则时间序列


    pandas对时间序列中缺失值进行线性插值_Johngo学长 (johngo689.com)

    一、问题背景

    以二氧化碳浓度数据集为例,该时间序列在某些时间点上会有缺失值。

    1. # 读取数据集
    2. df = pd.read_csv('二氧化碳浓度.csv')
    3. # 将字段Datetime数据类型转换为日期类型
    4. df['datetime'] = pd.to_datetime(df['datetime'], format="%Y-%m-%d %H:%M:%S")
    5. df

    value datetime 0 370 2022-04-13 00:11:00 1 375 2022-04-13 00:42:00 2 383 2022-04-13 01:11:00 3 391 2022-04-13 01:41:00 4 397 2022-04-13 02:11:00 … … … 583 369 2022-04-23 22:12:00 584 376 2022-04-23 22:41:00 585 379 2022-04-23 23:11:00 586 379 2022-04-23 23:12:00 587 378 2022-04-23 23:41:00

    588 rows × 2 columns

    该数据集是没个半小时一条数据,若想要填充成10分钟一条数据,就需要进行线性插值。

    二、pandas解决方案

    下面我们需要生成一个辅助dataframe,命名为helper, 包含所有日期。

    1. # pd.date_range时间序列
    2. # 间隔十分钟
    3. helper = pd.DataFrame({'datetime': pd.date_range(start=df['datetime'].min(), end=df['datetime'].max(),freq='600s')})
    4. # helper = pd.DataFrame(helper).set_index('datetime')#将时间列变为索引

    helper:

    datetime 0 2022-04-13 00:11:00 1 2022-04-13 00:21:00 2 2022-04-13 00:31:00 3 2022-04-13 00:41:00 4 2022-04-13 00:51:00 … … 1577 2022-04-23 23:01:00 1578 2022-04-23 23:11:00 1579 2022-04-23 23:21:00 1580 2022-04-23 23:31:00 1581 2022-04-23 23:41:00

    1582 rows × 1 columns

    然后再将helper和d join起来

    d = pd.merge(df, helper, on='datetime', how='outer').sort_values('datetime')
    

    最后一步就是插值了,直接用interpolate方法,method选择linear,线性插值。

    d['value'] = d['value'].interpolate(method='linear')
    

    value datetime 0 370.0 2022-04-13 00:11:00 588 NaN 2022-04-13 00:21:00 589 NaN 2022-04-13 00:31:00 590 NaN 2022-04-13 00:41:00 1 375.0 2022-04-13 00:42:00 … … … 585 379.0 2022-04-23 23:11:00 586 379.0 2022-04-23 23:12:00 1658 NaN 2022-04-23 23:21:00 1659 NaN 2022-04-23 23:31:00 587 378.0 2022-04-23 23:41:00

    1660 rows × 2 columns

    插值选择方法不止有线性,还可以是

    Original: pandas对时间序列中缺失值进行线性插值 - 王陸 - 博客园
    Author: 王陸
    Title: pandas对时间序列中缺失值进行线性插值

    原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/8417/

    转载文章受原作者版权保护。转载请注明原作者出处!

    ======================================================================

    pandas插值--非规则时间序列 

    pandas插值--非规则时间序列 - Heywhale.com

    问题描述:现有一观测资料,时间间隔非均匀,如何将其插值到均匀的时间序列?

    这个问题可以参考:工作台 - Heywhale.com ,具体步骤如下

    • 生成目标时间的list,和当前index取交集
    • 将交集reindex为新的索引,填充缺测,调用插值函数
    • 再次reindex到目标时间list

    当然,这里也可以考虑利用xarray的插值函数

    原始数据

    In [1]:

    import pandas as pd 
    import xarray as xr 
    import numpy as np
    
    Warning: ecCodes 2.21.0 or higher is recommended. You are running version 2.14.1
    

    In [2]:

    df = pd.read_csv("/home/mw/input/pcp6426/interp2.csv", parse_dates=[0], index_col=0)
    df 
    

    Out[2]:

    data
    time
    2019-12-10 17:17:37.1406248960.255970
    2019-12-10 17:17:37.3398443520.364656
    2019-12-10 17:17:37.5390620160.206504
    2019-12-10 17:17:37.7382807040.491149
    2019-12-10 17:17:37.9374999040.037144
    ......
    2019-12-10 17:17:56.1406248960.400654
    2019-12-10 17:17:56.3398443520.382707
    2019-12-10 17:17:56.5390620160.094809
    2019-12-10 17:17:56.7382807040.049449
    2019-12-10 17:17:56.9374999040.793193

    100 rows × 1 columns

    转到xarray

    In [3]:

    ds = df.to_xarray() 
    ds
    

    Out[3]:

    xarray.Dataset

    • Dimensions:
      • time: 100
    • Coordinates:
      • time

        (time)

        datetime64[ns]

        2019-12-10T17:17:37.140624896 .....

    • Data variables:
      • data

        (time)

        float64

        0.256 0.3647 ... 0.04945 0.7932

        array([0.25596959, 0.36465587, 0.20650436, 0.49114901, 0.03714372,
               0.28040043, 0.70309464, 0.68119741, 0.05901183, 0.67645174,
               0.28849891, 0.50132343, 0.62840763, 0.95099851, 0.02626051,
               0.33692432, 0.46553236, 0.50861759, 0.74550601, 0.55250929,
               0.52566986, 0.52261307, 0.37321184, 0.79995134, 0.23094678,
               0.87915976, 0.70379002, 0.1751345 , 0.08580627, 0.48962096,
               0.85569395, 0.20317885, 0.80113267, 0.64351059, 0.06970162,
               0.3934217 , 0.44517141, 0.89346621, 0.55945748, 0.72797598,
               0.02390159, 0.20475986, 0.51505429, 0.82730583, 0.77451266,
               0.65776446, 0.6155744 , 0.71989889, 0.01017897, 0.38116152,
               0.9602366 , 0.86313496, 0.76415434, 0.85792074, 0.84286672,
               0.56732516, 0.58180219, 0.23357688, 0.77524271, 0.56301411,
               0.48070764, 0.19323841, 0.37067283, 0.14380672, 0.8256961 ,
               0.90791718, 0.31419928, 0.42323499, 0.58264084, 0.73124201,
               0.80804131, 0.47969966, 0.2311884 , 0.45588249, 0.72280821,
               0.64822314, 0.84239576, 0.14129363, 0.59488152, 0.62924007,
               0.93597915, 0.96099593, 0.63139743, 0.86937852, 0.59717561,
               0.06693927, 0.93089973, 0.45931833, 0.51508115, 0.40343549,
               0.22152076, 0.84937486, 0.30642194, 0.8782651 , 0.09930092,
               0.40065394, 0.38270678, 0.09480903, 0.04944922, 0.79319296])
    • Attributes: (0)

    插值

    In [4]:

    idx = pd.date_range('2019-12-10 17:17:37','2019-12-10 17:17:57',freq='0.2S')
    ds = ds.interp(time=idx)
    

    In [5]:

    ds
    

    Out[5]:

    xarray.Dataset

    • Dimensions:
      • time: 101
    • Coordinates:
      • time

        (time)

        datetime64[ns]

        2019-12-10T17:17:37 ... 2019-12-...

    • Data variables:
      • data

        (time)

        float64

        nan 0.2884 0.3169 ... 0.2799 nan

        array([       nan, 0.2883623 , 0.31690053, 0.29357285, 0.35049546,
               0.11199205, 0.40889926, 0.69618861, 0.49088025, 0.25029839,
               0.55474113, 0.35610155, 0.54140369, 0.7270833 , 0.66450917,
               0.12372359, 0.37699039, 0.47854235, 0.58107818, 0.68571448,
               0.54408908, 0.52471756, 0.47749996, 0.503745  , 0.62367034,
               0.43430757, 0.82452572, 0.54415751, 0.14781035, 0.21091047,
               0.60446731, 0.65241178, 0.38373663, 0.75291846, 0.46574115,
               0.17126078, 0.40954361, 0.58053824, 0.79129799, 0.61166552,
               0.50709006, 0.08024552, 0.29845618, 0.61056731, 0.81095019,
               0.7378858 , 0.64462072, 0.64707616, 0.5028063 , 0.12511157,
               0.562832  , 0.92998591, 0.83324683, 0.79283606, 0.85325691,
               0.75642237, 0.57183528, 0.47665228, 0.39926427, 0.70949302,
               0.53719249, 0.39115053, 0.2468164 , 0.30127792, 0.35506012,
               0.85149093, 0.72295248, 0.34712363, 0.47199483, 0.62867837,
               0.75533589, 0.70575096, 0.40465934, 0.29991892, 0.53857767,
               0.69940899, 0.70871497, 0.63069137, 0.28003931, 0.60552601,
               0.72547187, 0.94377278, 0.86147056, 0.70419224, 0.78504843,
               0.43402571, 0.3244329 , 0.78850129, 0.47637533, 0.48049266,
               0.34746164, 0.40864556, 0.68542511, 0.48134011, 0.63693735,
               0.19202507, 0.39530499, 0.29577334, 0.08093415, 0.27986546,
                      nan])
    • Attributes: (0)

    参考:Interpolating data

    转回pandas

    In [6]:

    df = ds.to_dataframe()
    df
    

    Out[6]:

    data
    time
    2019-12-10 17:17:37.000NaN
    2019-12-10 17:17:37.2000.288362
    2019-12-10 17:17:37.4000.316901
    2019-12-10 17:17:37.6000.293573
    2019-12-10 17:17:37.8000.350495
    ......
    2019-12-10 17:17:56.2000.395305
    2019-12-10 17:17:56.4000.295773
    2019-12-10 17:17:56.6000.080934
    2019-12-10 17:17:56.8000.279865
    2019-12-10 17:17:57.000NaN

    101 rows × 1 columns

     

  • 相关阅读:
    【NVMe2.0b 14-8】Set Features(下篇)
    如何让Android平台像IPC一样实现GB28181前端设备接入
    如何在vscode中添加less插件
    基于JAVA-在线考试系统-计算机毕业设计源码+系统+数据库+lw文档+部署
    高德地图JSAPI 2.0使用Java代码代替Nginx进行反向代理
    Unity之ShaderGraph如何实现积雪效果
    银行笔试题 java笔试题
    在 VSCode 中使用 PlantUML
    window屏幕录制
    Day11.2:标签的使用
  • 原文地址:https://blog.csdn.net/lizz2276/article/details/126344748