码农知识堂 - 1000bd
  •   Python
  •   PHP
  •   JS/TS
  •   JAVA
  •   C/C++
  •   C#
  •   GO
  •   Kotlin
  •   Swift
  • Using The CuDLA API To Run A TensorRT Engine


    Using The CuDLA API To Run A TensorRT Engine

    Table Of Contents

    • Description
    • How does this sample work?
      • TensorRT API layers and ops
    • Prerequisites
    • Running the sample
      • Sample --help options
    • Additional resources
    • License
    • Changelog
    • Known issues

    Description

    This sample, sampleCudla, uses an API to construct a network of a single ElementWise layer and builds the engine. The engine runs in DLA standalone mode using cuDLA runtime. In order to do that, the sample uses cuDLA APIs to do engine conversion and cuDLA runtime preparation, as well as inference.

    How does this sample work?

    After the construction of a network, the module with cuDLA is loaded from the network data. The input and output tensors are then allocated and registered with cuDLA. When the input tensors are copied from CPU to GPU, the cuDLA task can be submitted and executed. Then we wait for stream operations to finish and bring output buffer to CPU to be verified for correctness.

    Specifically:

    • The single-layered network is built by TensorRT.
    • cudlaCreateDevice is called to create DLA device.
    • cudlaModuleLoadFromMemory is called to load the engine memory for DLA use.
    • cudaMalloc and cudlaMemRegister are called to first allocate memory on GPU, then let the CUDA pointer be registered with the DLA.
    • cudlaModuleGetAttributes is called to get module attributes from the loaded module.
    • cudlaSubmitTask is called to submit the inference task.

    TensorRT API layers and ops

    In this sample, the ElementWise layer is used. For more information, see the TensorRT Developer Guide: Layers documentation.

    Prerequisites

    This sample needs to be compiled with macro ENABLE_DLA=1, otherwise, this sample will print the following error message:

    Unsupported platform, please make sure it is running on aarch64, QNX or android.
    
    • 1

    and quit.

    Running the sample

    1. Compile this sample by running make in the /samples/sampleCudla directory. The binary named sample_cudla will be created in the /bin directory.
      cd /samples/sampleCudla make ENABLE_DLA=1

      Where `` is where you installed TensorRT.
      
      • 1
    2. Run the sample to perform inference on DLA.
      ./sample_cudla

    3. Verify that the sample ran successfully. If the sample runs successfully you should see an output similar to the following:
      &&&& RUNNING TensorRT.sample_cudla # ./sample_cudla [I] [TRT] [I] [TRT] --------------- Layers running on DLA: [I] [TRT] [DlaLayer] {ForeignNode[(Unnamed Layer* 0) [ElementWise]]}, [I] [TRT] --------------- Layers running on GPU: [I] [TRT] …(omit messages) &&&& PASSED TensorRT.sample_cudla

       This output shows that the sample ran successfully; `PASSED`.
      
      • 1

    Sample --help options

    To see the full list of available options and their descriptions, use the ./sample_cudla -h command line option.

    Additional resources

    The following resources provide a deeper understanding of sampleCudla.

    Documentation

    • Introduction To NVIDIA’s TensorRT Samples
    • Working With TensorRT Using The C++ API
    • NVIDIA’s TensorRT Documentation Library
    • Developer Guide for cuDLA APIs

    License

    For terms and conditions for use, reproduction, and distribution, see the TensorRT Software License Agreement documentation.

    Changelog

    June 2022
    This is the first release of the README.md file.

    Known issues

    There are no known issues with this tool.

  • 相关阅读:
    Java InputStream.available方法具有什么功能呢?
    嵌入式系统关于晶振的问题汇总
    WSL2连接USB设备(以USRP B210为例)
    如何使用 Checkmk 监控你的 Linux 服务器
    Android13冻结进程分析:如何提高设备性能和用户体验
    多个服务器的用户共享同一个用户目录的做法
    GBA破解老笔记-SD高达G世纪Advance
    微信小程序--云开发
    NAACL2022中Prompt相关论文分类
    Adams 插件Plugin二次开发教程
  • 原文地址:https://blog.csdn.net/u014647208/article/details/134326597
  • 最新文章
  • 攻防演习之三天拿下官网站群
    数据安全治理学习——前期安全规划和安全管理体系建设
    企业安全 | 企业内一次钓鱼演练准备过程
    内网渗透测试 | Kerberos协议及其部分攻击手法
    0day的产生 | 不懂代码的"代码审计"
    安装scrcpy-client模块av模块异常,环境问题解决方案
    leetcode hot100【LeetCode 279. 完全平方数】java实现
    OpenWrt下安装Mosquitto
    AnatoMask论文汇总
    【AI日记】24.11.01 LangChain、openai api和github copilot
  • 热门文章
  • 十款代码表白小特效 一个比一个浪漫 赶紧收藏起来吧!!!
    奉劝各位学弟学妹们,该打造你的技术影响力了!
    五年了,我在 CSDN 的两个一百万。
    Java俄罗斯方块,老程序员花了一个周末,连接中学年代!
    面试官都震惊,你这网络基础可以啊!
    你真的会用百度吗?我不信 — 那些不为人知的搜索引擎语法
    心情不好的时候,用 Python 画棵樱花树送给自己吧
    通宵一晚做出来的一款类似CS的第一人称射击游戏Demo!原来做游戏也不是很难,连憨憨学妹都学会了!
    13 万字 C 语言从入门到精通保姆级教程2021 年版
    10行代码集2000张美女图,Python爬虫120例,再上征途
Copyright © 2022 侵权请联系2656653265@qq.com    京ICP备2022015340号-1
正则表达式工具 cron表达式工具 密码生成工具

京公网安备 11010502049817号