• 算能RISC-V通用云编译飞桨paddlepaddle@openKylin留档


    尝试在算能云riscv环境里编译飞桨。

    先总结操作步骤:

    飞桨编译安装简洁步骤

    下载飞桨代码GitHub - PaddlePaddle/Paddle: PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署):git clone https://github.com/paddlepaddle/paddle

    参照prAdding a compile option to Paddle for Risc-V · PaddlePaddle/Paddle@d3db383 · GitHub修改代码

    然后编译

    1. cmake ../ -DWITH_GPU=OFF -DWITH_RISCV=ON
    2. make -j 128 TARGET=RISCV64_GENERIC

    (注意以上两句最好分开执行,以便发现cmake是否报错,尤其是修改了cmake 配置文件之后。否则可能没生效而重复编译)

    编译好后安装:

    pip install paddlepaddle-0.0.0-cp38-cp38-linux_riscv64.whl -i https://mirror.baidu.com/pypi/simple

    最后一步:注册libpaddle.so ,使用命令:patchelf --add-needed libatomic.so.1 /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so

    这样大家就可以愉快的使用飞桨在RISCV环境下进行AI机器学习拉!

    详细的飞桨编译安装过程

    先git下载源码

    git clone https://github.com/paddlepaddle/padle

    配置环境

    pip3 install protobuf

    还有一些库等,具体可以参考编译pytorch的文档:算能RISC-V通用云开发空间编译pytorch @openKylin留档-CSDN博客

    然后进入paddle目录编译

    mkdir build

    cd build

    cmake ../

    我的天,除了protobuf,竟然一下子编译完成了!

    1. Automatic code generation for paddle/fluid/primitive succeed.
    2. Automatic code generation for decomp interface succeed.
    3. WITH_DLNNE:
    4. -- Configuring done
    5. -- Generating done
    6. -- Build files have been written to: /root/github/paddle/build

    是我肤浅了,后面还需要编译呢root@863c89a419ec:~/github/paddle/build# make

    make -j 16

    make -j 8 TARGET=RISCV64_GENERIC -dw 

    如果碰到报错,根据报错进行处理,比如哪个第三方库编译失败,就删除那个目录,然后在third_party目录执行:git submodule update --init --recursive

    然后再make 即可。

    困了,让它编译去吧。

    报错了

    通过官网issue,发现cmake指令为:

    cmake ../ -DWITH_GPU=OFF -WITH_RISCV=ON

    make 指令为;

    make -j 16 TARGET=RISCV64_GENERIC

    发现cmake指令有问题

    拼写错误,应该是:

    cmake ../ -DWITH_GPU=OFF -DWITH_RISCV=ON

    然后再make

    看MakeFile文件里面没有RISCV选项,不知道是不是cmake编译器那边支持,先执行看看。(后来知道,人家这个RISCV是对应了专门的pr的,目前这个pr还没有合并,所以需要手工改密码)

    cmake结束之后提示WITH_RISCV没有生效

    手工添加飞桨对riscv的支持

    修改CMakeFile文件,在WITH_ARM的后面加上:

    1. if(WITH_RISCV)
    2. set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fPIC")
    3. set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC")
    4. set(WITH_XBYAK
    5. OFF
    6. CACHE STRING "Disable XBYAK when compiling WITH_RISCV=ON." FORCE)
    7. set(WITH_MKL
    8. OFF
    9. CACHE STRING "Disable MKL when compiling WITH_RISCV=ON." FORCE)
    10. set(WITH_AVX
    11. OFF
    12. CACHE STRING "Disable AVX when compiling WITH_AVX=OFF." FORCE)
    13. add_definitions(-DPADDLE_WITH_RISCV)
    14. endif()

    修改文件:cmake/flags.cmake

    在ARM后面加上:AND NOT WITH_RISCV

    同时找到了-m64的位置。因为加了这句,就不用手工删除-m64这句。

    修改文件:paddle/fluid/operators/search_compute.h

    一共需要修改四处,在四处加上defined(PADDLE_WITH_RISCV):

    1. #pragma once
    2. #if !defined(PADDLE_WITH_ARM) && !defined(PADDLE_WITH_SW) && \
    3. !defined(PADDLE_WITH_MIPS) && !defined(PADDLE_WITH_LOONGARCH) && \
    4. !defined(PADDLE_WITH_RISCV)
    5. #include
    6. #endif
    7. #include
    8. @@ -103,7 +104,8 @@ void call_gemm_batched(const framework::ExecutionContext& ctx,
    9. }
    10. #if !defined(PADDLE_WITH_ARM) && !defined(PADDLE_WITH_SW) && \
    11. !defined(PADDLE_WITH_MIPS) && !defined(PADDLE_WITH_LOONGARCH) && \
    12. !defined(PADDLE_WITH_RISCV)
    13. #define __m256x __m256
    14. @@ -144,7 +146,8 @@ inline void axpy(const T* x, T* y, size_t len, const T alpha) {
    15. _mm256_mul_px(mm_alpha, _mm256_load_px(x + jjj))));
    16. }
    17. #elif defined(PADDLE_WITH_ARM) || defined(PADDLE_WITH_SW) || \
    18. defined(PADDLE_WITH_MIPS) || defined(PADDLE_WITH_LOONGARCH) || \
    19. defined(PADDLE_WITH_RISCV)
    20. PADDLE_THROW(platform::errors::Unimplemented("axpy is not supported"));
    21. #else
    22. lll = len & ~SSE_CUT_LEN_MASK;
    23. @@ -174,7 +177,8 @@ inline void axpy_noadd(const T* x, T* y, size_t len, const T alpha) {
    24. _mm256_store_px(y + jjj, _mm256_mul_px(mm_alpha, _mm256_load_px(x + jjj)));
    25. }
    26. #elif defined(PADDLE_WITH_ARM) || defined(PADDLE_WITH_SW) || \
    27. defined(PADDLE_WITH_MIPS) || defined(PADDLE_WITH_LOONGARCH) || \
    28. defined(PADDLE_WITH_RISCV)
    29. PADDLE_THROW(platform::errors::Unimplemented("axpy_noadd is not supported"));
    30. #else
    31. lll = len & ~SSE_CUT_LEN_MASK;

    修改文件:paddle/fluid/platform/denormal.cc

    33行if那句后面加上RISCV,变成这样:

    1. #if !defined(GCC_WITHOUT_INTRINSICS) && !defined(PADDLE_WITH_ARM) && \
    2. !defined(PADDLE_WITH_SW) && !defined(PADDLE_WITH_MIPS) && \
    3. !defined(_WIN32) && !defined(PADDLE_WITH_LOONGARCH) && \
    4. !defined(PADDLE_WITH_RISCV)
    5. #define DENORM_USE_INTRINSICS

    修改文件paddle/phi/backends/cpu/cpu_info.cc

    1. } else {
    2. #if !defined(WITH_NV_JETSON) && !defined(PADDLE_WITH_ARM) && \
    3. !defined(PADDLE_WITH_SW) && !defined(PADDLE_WITH_MIPS) && \
    4. !defined(PADDLE_WITH_LOONGARCH) && !defined(PADDLE_WITH_RISCV)
    5. std::array reg;
    6. cpuid(reg.data(), 0);
    7. int nIds = reg[0];

    修改文件paddle/phi/backends/cpu/cpu_info.h

    1. #define cpuid(reg, x) __cpuidex(reg, x, 0)
    2. #else
    3. #if !defined(WITH_NV_JETSON) && !defined(PADDLE_WITH_ARM) && \
    4. !defined(PADDLE_WITH_SW) && !defined(PADDLE_WITH_MIPS) && \
    5. !defined(PADDLE_WITH_LOONGARCH) && !defined(PADDLE_WITH_RISCV)
    6. #include
    7. inline void cpuid(int reg[4], int x) {

    感觉离曙光很近了。

    重新cmake和make,希望不会碰到“pmmintrin.h”文件找不到的错误。

    发现少修改一个文件

    修改文件cmake/third_party.cmake

    1. if(WITH_POCKETFFT)
    2. include(external/pocketfft)
    3. list(APPEND third_party_deps extern_pocketfft)
    4. add_definitions(-DPADDLE_WITH_POCKETFFT)
    5. if(WITH_RISCV)
    6. set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-maybe-uninitialized") # Warnings in pocketfft_hdronly.h
    7. endif()
    8. endif()

    11点开始编译,计时开始!15:37分编译完成,耗时4小时37分钟!

    哈哈,它静静的躺在那里:

    1. root@863c89a419ec:~/github/paddle/build/python# ls dist/
    2. paddlepaddle-0.0.0-cp38-cp38-linux_riscv64.whl
    3. root@863c89a419ec:~/github/paddle/build/python# pip3 install dist/paddlepaddle-0.0.0-cp38-cp38-linux_riscv64.whl -i https://mirror.baidu.com/pypi/simple
    4. Processing ./dist/paddlepaddle-0.0.0-cp38-cp38-linux_riscv64.whl

    安装:

    pip install paddlepaddle-0.0.0-cp38-cp38-linux_riscv64.whl

    安装后执行import paddle 报错:ImportError: /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so: undefined symbol: __atomic_exchange_1

    按照这个issue 算能云RISCV环境编译后import paddle报错 · Issue #62037 · PaddlePaddle/Paddle · GitHub里面修改,还是报错

    将build文件夹删除,重新建立,重新cmake 和make,并把-j改成128

    问题依旧

    解决import paddle报错

    突然想到pytorch安装的时候也碰到这个报错,于是找到了解决方法:

    只需要用patchelf注册一下就行了`patchelf --add-needed libatomic.so.1 /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so `

    如果没有patchelf,apt install patchelf安装即可。

    终于飞桨可以用了:

    1. root@863c89a419ec:~# python3
    2. Python 3.8.2 (default, Jan 18 2024, 07:05:37)
    3. [GCC 9.3.0] on linux
    4. Type "help", "copyright", "credits" or "license" for more information.
    5. >>> import paddle
    6. >>> paddle.utils.run_check()
    7. Running verify PaddlePaddle program ...
    8. I0228 07:37:38.344522 279426 program_interpreter.cc:220] New Executor is Running.
    9. I0228 07:37:38.475822 279426 interpreter_util.cc:652] Standalone Executor is Used.
    10. PaddlePaddle works well on 1 CPU.
    11. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.

    总结:

    目前已编译成功。

    目前import paddle成功。run_check()成功。飞桨成功编译好了,可以开始使用了!

    1. >>> import paddle
    2. >>> paddle.utils.run_check()
    3. Running verify PaddlePaddle program ...
    4. I0228 07:37:38.344522 279426 program_interpreter.cc:220] New Executor is Running.
    5. I0228 07:37:38.475822 279426 interpreter_util.cc:652] Standalone Executor is Used.
    6. PaddlePaddle works well on 1 CPU.
    7. PaddlePaddle is installed successfully! Let's start deep learning with PaddlePaddle now.
    8. >>> x = paddle.randn((2,3))
    9. >>> y = paddle.randn((2,3))
    10. >>> z = x+y
    11. >>> z
    12. Tensor(shape=[2, 3], dtype=float32, place=Place(cpu), stop_gradient=True,
    13. [[ 0.70946050, -0.07831943, -0.60504013],
    14. [-2.55267453, -0.59097183, 2.08291411]])

    调试

    报错:Could NOT find PY_google.protobuf (missing: PY_GOOGLE.PROTOBUF)

    Could NOT find PY_google.protobuf (missing: PY_GOOGLE.PROTOBUF)

    安装protobuf

    pip3 install protobuf

    编译时到100%报错

    [ 97%] Building CXX object gloo/CMakeFiles/gloo.dir/transport/tcp/unbound_buffer.cc.o
    [100%] Linking CXX static library libgloo.a
    [100%] Built target gloo
    [  3%] Performing install step for 'extern_gloo'
    [  3%] Completed 'extern_gloo'
    [  3%] Built target extern_gloo
    make: *** [Makefile:136: all] Error 2
    再重新编译,到了protobuf这里报错:

    -- Installing: /root/github/paddle/build/third_party/install/protobuf/lib/cmake/protobuf/protobuf-config.cmake
    [  4%] Completed 'extern_protobuf'
    [  4%] Built target extern_protobuf
    make: *** [Makefile:136: all] Error 2

    再重新编译,发现这里报错:

    报错:cc1: error: '-march=loongson3a': ISA string must begin with rv32 or rv64

    cc1: error: requested ABI requires '-march' to subsume the 'D' extension
    cc1: error: ABI requires '-march=rv64'
    make[4]: *** [Makefile:737: isamin.o] Error 1
    cc1: error: '-march=loongson3a': ISA string must begin with rv32 or rv64
    make[4]: *** [Makefile:611: sasum.o] Error 1
    cc1: error: '-march=loongson3a': ISA string must begin with rv32 or rv64
    cc1: error: '-march=loongson3a': ISA string must begin with rv32 or rv64
    cc1: error: requested ABI requires '-march' to subsume the 'D' extension
    cc1: error: requested ABI requires '-march' to subsume the 'D' extension
    cc1: error: ABI requires '-march=rv64'
    cc1: error: ABI requires '-march=rv64'
    cc1: error: requested ABI requires '-march' to subsume the 'D' extension
    cc1: error: ABI requires '-march=rv64'
    make[4]: *** [Makefile:647: snrm2.o] Error 1
    make[4]: *** [Makefile:629: ssum.o] Error 1
    make[4]: *** [Makefile:701: smax.o] Error 1
    make[4]: *** [Makefile:665: samax.o] Error 1
    make[4]: *** [Makefile:755: ismax.o] Error 1
    make[4]: *** [Makefile:773: sdsdot.o] Error 1
    make[3]: *** [Makefile:164: libs] Error 1
    make[2]: *** [CMakeFiles/extern_openblas.dir/build.make:86: third_party/openblas/src/extern_openblas-stamp/extern_openblas-build] Error 2
    make[1]: *** [CMakeFiles/Makefile2:4001: CMakeFiles/extern_openblas.dir/all] Error 2
    [  5%] Built target eager_python_c_codegen
    [  5%] Built target op_map_codegen
    [  5%] Built target eager_codegen
    make: *** [Makefile:136: all] Error 2

    参考这里重新来过:

    如何在RISC-V平台编译paddle - 知乎

    编译报错:

    Live child 0x2af978fd70 (paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen) PID 115530
    Reaping winning child 0x2af978fd70 PID 115530
    Removing child 0x2af978fd70 PID 115530 from chain.
    Considering target file 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/build'.
     File 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/build' does not exist.
      Considering target file 'eager_codegen'.
       File 'eager_codegen' does not exist.
        Considering target file 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen'.
        File 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen' was considered already.
        Considering target file 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/build.make'.
        File 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/build.make' was considered already.
       Finished prerequisites of target file 'eager_codegen'.
      Must remake target 'eager_codegen'.
      Successfully remade target file 'eager_codegen'.
     Finished prerequisites of target file 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/build'.
    Must remake target 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/build'.
    Successfully remade target file 'paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/build'.
    make[2]: Leaving directory '/root/github/paddle/build'
    Reaping winning child 0x2b00eb0510 PID 115266
    Live child 0x2b00eb0510 (paddle/fluid/eager/auto_code_generator/generator/CMakeFiles/eager_codegen.dir/all) PID 115532
    [  4%] Built target eager_codegen
    Reaping winning child 0x2b00eb0510 PID 115532
    Removing child 0x2b00eb0510 PID 115532 from chain.
    make[1]: Leaving directory '/root/github/paddle/build'
    Reaping losing child 0x2ad05e7c50 PID 115118
    make: *** [Makefile:136: all] Error 2
    Removing child 0x2ad05e7c50 PID 115118 from chain.
    make: Leaving directory '/root/github/paddle/build'

    在paddle/build/third_party删除eigen3目录,然后执行:git submodule update --init --recursive

    问题解决。

    问题没有解决,还是4%这里报错:

    [  4%] Built target eager_codegen
    Reaping winning child 0x2b167efa70 PID 117670
    Removing child 0x2b167efa70 PID 117670 from chain.
    make[1]: Leaving directory '/root/github/paddle/build'
    Reaping losing child 0x2b0332dc20 PID 117206
    make: *** [Makefile:136: all] Error 2
    Removing child 0x2b0332dc20 PID 117206 from chain.
    make: Leaving directory '/root/github/paddle/build'

    重新设置make 后,cmake ../ -DWITH_GPU=OFF -WITH_RISCV=ON 编译报错

    报错c++: error: unrecognized command line option '-m64'

    按照文档,应该“查找makefile文件的生成逻辑,最终发现在Paddle.cmake文件中有这样一段逻辑”,将该文件中的-64去掉。

    但是没有找到这个文件。找到了,在这里:

    cmake/flags.cmake

      if(NOT WITH_NV_JETSON
         AND NOT WITH_ARM
         AND NOT WITH_RISCV
         AND NOT WITH_SW
         AND NOT WITH_MIPS
         AND NOT WITH_LOONGARCH)
        set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -m64")
      endif()

    后期通过修改源码,添加RISCV选项的方法,使代码支持RISCV,并同时使用-m64失效

    在cmake的时候加上DWITH_RISCV提示没有生效

    WITH_DLNNE:
    -- Configuring done
    -- Generating done
    CMake Warning:
      Manually-specified variables were not used by the project:

        WITH_RISCV

    按照飞桨官网pr,修改飞桨make相关文件。

    编译报错没有 pmmintrin.h文件

    [  7%] Building CXX object paddle/fluid/platform/CMakeFiles/denormal.dir/denormal.cc.o
    /root/github/paddle/paddle/fluid/platform/denormal.cc:38:10: fatal error: pmmintrin.h: No such file or directory
       38 | #include
          |          ^~~~~~~~~~~~~
    compilation terminated.
    make[2]: *** [paddle/fluid/platform/CMakeFiles/denormal.dir/build.make:76: paddle/fluid/platform/CMakeFiles/denormal.dir/denormal.cc.o] Error 1
    make[1]: *** [CMakeFiles/Makefile2:4787: paddle/fluid/platform/CMakeFiles/denormal.dir/all] Error 2
    make[1]: *** Waiting for unfinished jobs....

    这个对应的就是paddle/fluid/platform/denormal.cc 这个文件,已经修改。

    编译安装好后,

    import paddle报错

    oot@863c89a419ec:~/github/paddle/build# python3
    Python 3.8.2 (default, Jan 18 2024, 07:05:37)
    [GCC 9.3.0] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import paddle
    Error: Can not import paddle core while this file exists: /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so
    Traceback (most recent call last):
      File "/usr/local/lib/python3.8/dist-packages/paddle/base/core.py", line 267, in
        from . import libpaddle
    ImportError: /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so: undefined symbol: __atomic_exchange_1

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "", line 1, in
      File "/usr/local/lib/python3.8/dist-packages/paddle/__init__.py", line 30, in
        from .base import core  # noqa: F401
      File "/usr/local/lib/python3.8/dist-packages/paddle/base/__init__.py", line 38, in
        from . import (  # noqa: F401
      File "/usr/local/lib/python3.8/dist-packages/paddle/base/backward.py", line 25, in
        from . import core, framework, log_helper, unique_name
      File "/usr/local/lib/python3.8/dist-packages/paddle/base/core.py", line 377, in
        if not avx_supported() and libpaddle.is_compiled_with_avx():
    NameError: name 'libpaddle' is not defined

    看文档,有人说是patchelf版本低的缘故,升级patchelf,再重新编译,问题依旧

    https://github.com/PaddlePaddle/Paddle/issues/51536

    已提issue :https://github.com/PaddlePaddle/Paddle/issues/62037

    根据issue里的提示,修改文件

    目前还未解决报错问题

    解决undefined symbol: __atomic_exchange_1报错

    只需要用patchelf注册一下就行了patchelf --add-needed libatomic.so.1 /usr/local/lib/python3.8/dist-packages/paddle/base/libpaddle.so 

    如果没有patchelf,apt install patchelf安装即可。

    查看算能云核心数

    • 方法一:使用 lscpu 命令
    • 方法二:使用 cat /proc/cpuinfo 命令
    • 方法三:使用 nproc 命令

    发现算能云是128核,所以修改参数-j 128

    学习官网issue

    risc-v芯片上编译paddle报错 · Issue #61770 · PaddlePaddle/Paddle · GitHub

    学习官网改risc-v的pr

    Adding a compile option to Paddle for Risc-V · PaddlePaddle/Paddle@d3db383 · GitHub

    算能云RISCV使用和申请地址

    https://cloud.sophgo.com/?lang=CN

  • 相关阅读:
    ES6 Set对象和Map对象
    良心之作,7 个值得收藏的 GitHub 开源项目!
    基于Eigen的椭圆拟合
    gitlab 离线安装问题解决:NOKEY,signature check fail
    基于物联网技术的工程、地质自动化安全监测产业链及工程应用
    zabbix添加监控主机和自定义监控项
    安装对应版本pytorch和torchvision
    CDH6.3.2 的pyspark读取excel表格数据写入hive中的问题汇总
    智能搬运小车(自动抓取、循迹)
    【RAG 论文】面向知识库检索进行大模型增强的框架 —— KnowledGPT
  • 原文地址:https://blog.csdn.net/skywalk8163/article/details/136264198