• 量化交易系列【2】Pandas数据筛选及处理相关操作


    示例中使用的‘000001.XSHE.csv’文件,已上传至csdn资源中,可直接下载

    import pandas as pd
    
    • 1
    # 读取CSV文件
    df = pd.read_csv('./000001.XSHE.csv')
    df = df[:5]  # 取前5行数据
    df
    
    • 1
    • 2
    • 3
    • 4
    dateopenclosehighlowvolumemoney
    02015/1/59.9810.0010.179.744580990374.565388e+09
    12015/1/69.909.8510.239.713469524963.453446e+09
    22015/1/79.729.679.889.552722744012.634796e+09
    32015/1/89.689.349.729.302254455022.128003e+09
    42015/1/99.309.429.919.194017364193.835378e+09
    df['code'] = '000001.XSHE'  # 添加股票代码列
    
    • 1

    筛选数据

    # 筛选指定股票代码数据
    # 选取某一只股票
    df[df['code']=='000001.XSHE']
    # 选取多只股票,用isin后面接列表
    df[df['code'].isin(['000001.XSHE','000002.XSHE'])]
    
    • 1
    • 2
    • 3
    • 4
    • 5
    # 筛选收盘价大于9的数据
    df[df['close'] >= 9.0]
    
    • 1
    • 2
    dateopenclosehighlowvolumemoneycode
    02015/1/59.9810.0010.179.744580990374.565388e+09000001.XSHE
    12015/1/69.909.8510.239.713469524963.453446e+09000001.XSHE
    22015/1/79.729.679.889.552722744012.634796e+09000001.XSHE
    32015/1/89.689.349.729.302254455022.128003e+09000001.XSHE
    42015/1/99.309.429.919.194017364193.835378e+09000001.XSHE
    # 筛选日期在2015/1/5与2015/1/7之间的数据
    df['date'] = pd.to_datetime(df['date'])   #将字符串的日期转换为日期类型
    df[(df.date>='2015-01-05') & (df.date<='2015-01-07')]
    
    • 1
    • 2
    • 3
    dateopenclosehighlowvolumemoneycode
    02015-01-059.9810.0010.179.744580990374.565388e+09000001.XSHE
    12015-01-069.909.8510.239.713469524963.453446e+09000001.XSHE
    22015-01-079.729.679.889.552722744012.634796e+09000001.XSHE

    处理缺失值

    删除空值dropna

    # how='any'表示只要有一个空值该行就删掉,how='all'所有列全为空值才删掉
    df.dropna(how='any') 
    df.dropna(subset=['涨跌幅''MACD死叉’],how='all')  # 表示 '涨跌幅','MACD死叉’这两个值均为空就删除改行
    
    • 1
    • 2
    • 3

    补全缺失值fillna

    # 将缺失值给固定值,下面是将所有缺失值补位0
    df.fillna(value='0')  
    
    # 通过其他列的值补充该列的缺失值,下面示例:用收盘价来补充'MACD金叉死叉'空值
    df['MACD金叉死叉'].fillna(value=df['收盘价'], inplace=True)
    
    # 向上寻找最近的一个非空值来补充该位置forward fill
    df.fillna(method='ffill')
    
    # 向下寻找最近的一个非空值来补充该位置backward fill
    df.fillna(method='bfill')
    
    # 判断缺失值
    df[df['涨跌幅'].notnull()]  #找出'涨跌幅'不为null的数据
    df[df['涨跌幅'].isnull()]   #找出'涨跌幅'为null的数据
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8
    • 9
    • 10
    • 11
    • 12
    • 13
    • 14
    • 15

    排序函数

    # 重置索引,新的index为0,1,2,3...
    df.reset_index(inplace=True)  
    
    # 按照某一列进行排序, by指按照什么进行排序
    df.sort_values(by=['交易日期'],ascending=True)  #
    
    #  按照多列进行排序
    df.sort_values(by=['股票名称''交易日期'], ascending=[True, True])  #
    
    • 1
    • 2
    • 3
    • 4
    • 5
    • 6
    • 7
    • 8

    合并操作

    df1 = df.iloc[0:2]
    df1
    
    • 1
    • 2
    dateopenclosehighlowvolumemoney
    02015/1/59.9810.0010.179.744580990374.565388e+09
    12015/1/69.909.8510.239.713469524963.453446e+09
    df2 = df.iloc[3:5]
    df2
    
    • 1
    • 2
    dateopenclosehighlowvolumemoney
    32015/1/89.689.349.729.302254455022.128003e+09
    42015/1/99.309.429.919.194017364193.835378e+09
    # 将df1,df2上下拼接,ignore_index=True,表示去除原有的Index,重新用0,1,2.....代替
    df3 = df1.append(df2,ignore_index=True)
    df3
    
    • 1
    • 2
    • 3
    dateopenclosehighlowvolumemoney
    02015/1/59.9810.0010.179.744580990374.565388e+09
    12015/1/69.909.8510.239.713469524963.453446e+09
    22015/1/89.689.349.729.302254455022.128003e+09
    32015/1/99.309.429.919.194017364193.835378e+09

    去除重复数据

    df3.append(df2)
    
    • 1
    dateopenclosehighlowvolumemoney
    02015/1/59.9810.0010.179.744580990374.565388e+09
    12015/1/69.909.8510.239.713469524963.453446e+09
    22015/1/89.689.349.729.302254455022.128003e+09
    32015/1/99.309.429.919.194017364193.835378e+09
    32015/1/89.689.349.729.302254455022.128003e+09
    42015/1/99.309.429.919.194017364193.835378e+09
    # 去除['close','open']列相同的数据,如果不写subset,表示两行所有数据均相同才会去除
    # keep参数表示保留哪一行的数据:last表示最后一行,first表示第一行
    df3.drop_duplicates(subset=['close','open'],keep='last',inplace=True)
    df3
    
    • 1
    • 2
    • 3
    • 4
    dateopenclosehighlowvolumemoney
    02015/1/59.9810.0010.179.744580990374.565388e+09
    12015/1/69.909.8510.239.713469524963.453446e+09
    22015/1/89.689.349.729.302254455022.128003e+09
    32015/1/99.309.429.919.194017364193.835378e+09
  • 相关阅读:
    django的运行命令
    微服务【Ribbon负载均衡&源码解析&饥饿加载】第2章
    Playcanvas后处理-辉光bloom
    Docker部署selenium程序-从本地到云服务器-详细步骤
    SpringBoot整合dubbo(三)
    QT-事件循环机制
    【距离注意残差网络:超分】
    20张图说清楚 IP 协议
    Python自动化测试selenium指定截图文件名方法
    冠军斩获10万奖金!首届“域见杯”医检AI开发者大赛精彩落幕
  • 原文地址:https://blog.csdn.net/qq_42589613/article/details/127408431