• Python和Pandas对事件数据的处理:以电动汽车充电数据为例


    1、数据

    电动汽车的充电数据形式如下 

    订单号充电开始时间充电完成时间订单/时段总充电量(KWh)尖时电量峰时电量平时电量谷时电量
    2023020105000026122023-02-01 00:03:262023-02-01 00:40:5228.4410.0000.0000.00028.441
    2023020105000045702023-02-01 00:10:282023-02-01 01:01:3545.3190.0000.0000.00045.319
    2023020105000047282023-02-01 00:10:322023-02-01 01:27:1357.3120.0000.0000.00057.312
    2023020105000049592023-02-01 00:11:322023-02-01 01:16:4525.4920.0000.0000.00025.492
    2023020105000069692023-02-01 00:22:592023-02-01 01:36:3844.8890.0000.0000.00044.889
    2023020105000070762023-02-01 00:24:132023-02-01 00:40:539.1310.0000.0000.0009.131
    2023020105000070822023-02-01 00:24:352023-02-01 00:40:518.1870.0000.0000.0008.187
    2023020105000078652023-02-01 00:29:272023-02-01 01:31:3933.8940.0000.0000.00033.894
    2023020105000086632023-02-01 00:35:202023-02-01 01:20:3128.2400.0000.0000.00028.240
    2023020105000087552023-02-01 00:35:502023-02-01 01:32:0431.7520.0000.0000.00031.752

     2、目标任务

    以5分钟为一个时段,需要知道每天每个时段的总充电量,最终结果如下所示

    3、数据处理

    Python代码在《excel_pandas.py》中,具体如下

    1. import pandas as pd
    2. import time
    3. import os
    4. start_time = time.time() # 开始计时
    5. carDF = pd.read_excel(io='./carChageTime.xlsx')# 读取xlsx(相对路径)
    6. # 将时间字符串转换为时间戳
    7. carDF['start_time'] = pd.to_datetime(carDF['充电开始时间'])
    8. carDF['end_time'] = pd.to_datetime(carDF['充电完成时间'])
    9. # 建立空表,以5分钟为间隔
    10. start_index_time = pd.to_datetime('2023-02-01')
    11. end_index_time = pd.to_datetime('2023-07-31')
    12. power_index = pd.date_range(start=start_index_time, end=end_index_time, freq='5min')
    13. power_excel = pd.DataFrame(data=0, columns=['power'], index=power_index)
    14. pass
    15. start_index_bill = 0
    16. end_index_bill = 52000
    17. for index in range(start_index_bill, end_index_bill): # 260
    18. print(start_index_bill, '/', index, '/', end_index_bill)
    19. count = 0
    20. num_5minutes = (carDF['end_time'][index] - carDF['start_time'][index]) // pd.Timedelta("5T")
    21. if num_5minutes == 0: # 0不能做除数
    22. power_average = carDF['订单/时段总充电量(KWh)'][index] # 将每一个订单的总电量均分
    23. # print('num_5minutes=0', 'index=', index, 'power=', power_average)
    24. # print(carDF['start_time'][index], carDF['end_time'][index])
    25. for x in power_index:
    26. if carDF['start_time'][index] <= x:
    27. power_excel.loc[x] = power_excel.loc[x] + power_average
    28. break
    29. else:
    30. power_average = carDF['订单/时段总充电量(KWh)'][index] / num_5minutes
    31. for x in power_index:
    32. if carDF['start_time'][index] <= x <= carDF['end_time'][index]:
    33. count = count + 1
    34. if count > num_5minutes:
    35. break
    36. else:
    37. power_excel.loc[x] = power_excel.loc[x] + power_average
    38. pass
    39. # 保存结果
    40. realTime = time.strftime('%Y%m%d%H%M%S', time.localtime())
    41. file_name = os.path.basename(__file__)[0:-3]
    42. power_excel.to_csv('{0}_{1}_{2}_{3}.csv'.format(file_name, realTime, start_index_bill, end_index_bill))
    43. # 打印程序的运行时间
    44. end_time = time.time()
    45. print("end_time-start_time={}".format(end_time - start_time))

    最终生成一个《excel_pandas_slice_20230918194213_0_52000.csv》,内容如下

    4、进一步处理

     Python代码在《csv_to_excel.py》中,具体如下

    1. import pandas as pd
    2. import time
    3. import os
    4. start_time = time.time() # 开始计时
    5. df_RNN = pd.read_csv('excel_pandas_slice_20230918194213_0_52000.csv') # 读取xlsx(相对路径)
    6. # 行名
    7. start_index_time = pd.to_datetime('2023-02-01 00:05:00') # Timestamp('2023-02-01 00:00:00')
    8. end_index_time = pd.to_datetime('2023-02-02 00:00:00') # Timestamp('2023-07-31 00:00:00')
    9. power_index = pd.date_range(start=start_index_time, end=end_index_time, freq='5min')
    10. # 列名
    11. start_column_time = pd.to_datetime('2023-02-01') # Timestamp('2023-02-01 00:00:00')
    12. end_column_time = pd.to_datetime('2023-7-31') # Timestamp('2023-07-31 00:00:00')
    13. power_column = pd.date_range(start=start_column_time, end=end_column_time, freq='D')
    14. power_excel = pd.DataFrame(data=0, columns=power_column, index=power_index)
    15. # num_5minutes用于记录一天被划分为了多少个时间段
    16. # 这里以5分钟为一个时间段,所以num_5minutes=288
    17. t_step = pd.Timedelta("5T") # 时间间隔5minutes
    18. num_5minutes = (end_index_time - start_index_time) // t_step + 1 # 288
    19. pass
    20. for i in range(df_RNN.shape[0] // num_5minutes):
    21. power_excel.loc[:, power_column[i]] = list(df_RNN['power'][i * num_5minutes:(i + 1) * num_5minutes])
    22. # 保存结果
    23. realTime = time.strftime('%Y%m%d%H%M%S', time.localtime())
    24. file_name = os.path.basename(__file__)[0:-3]
    25. power_excel.to_excel('{0}_{1}.xlsx'.format(file_name, realTime))
    26. # 打印程序的运行时间
    27. end_time = time.time()
    28. print("end_time-start_time={}".format(end_time - start_time))

     最终生成一个《excel_to_csv_20230918203118.xlsx》,内容如下

     这里在excel对《excel_to_csv_20230918203118.xlsx》的索引进行了设置单元格式处理

    5、绘图 

    绘图参考

    EXCEL如何在一个图上画多条曲线-百度经验 (baidu.com)

    参考另一篇博客

    Python模块之pandas中时间戳之间的相互转换_飞由于度的博客-CSDN博客

  • 相关阅读:
    flowable,199boge,进阶,06完成
    面向对象17:异常机制
    PSP - 蛋白质序列提取 Transformer 蛋白质语言模型 ESM2 特征
    Ubuntu虚拟机安装教程
    数据结构与算法复习:第十弹
    2022.8.9考试独特的投标拍卖--800题解
    网络安全(黑客)自学
    五魔方、二阶五魔方
    FPGA图像处理(一):边缘检测
    图书《人工智能怎么学》官方课件正式发布
  • 原文地址:https://blog.csdn.net/qq_35629563/article/details/133012428