码农知识堂 - 1000bd
  •   Python
  •   PHP
  •   JS/TS
  •   JAVA
  •   C/C++
  •   C#
  •   GO
  •   Kotlin
  •   Swift
  • TensorRTx 开源代码内容说明


    TensorRTx 提供了把常见网络模型转化为 TensorRT 格式的功能。TensorRTx旨在使用tensorrt网络定义API实现流行的深度学习网络。tensorrt有内置的解析器,包括caffeparser、uffparser、onnxparser等,当我们使用这些解析器时,我们经常遇到一些“不受支持的操作或层”问题,特别是一些最先进的模型正在使用新类型的层。

    那么我们为什么不跳过所有的解析器呢?我们只使用TensorRT网络定义API来构建整个网络,并不复杂。

    TensorRTx 所有模型首先在pytorch/mxnet/tensorflown中实现,然后导出权重文件xxx.wts,然后使用tensorrt加载权重,定义网络并进行推理。一些pytorch实现可以在my repo Pytorchx中找到,其余的来自polular开源实现。

    更新

    • 19 Aug 2022. Dominic and sbmalik: Yolov3-tiny and Arcface support TRT8.
    • 6 Jul 2022. xiang-wuu: SuperPoint - Self-Supervised Interest Point Detection and Description, vSLAM related.
    • 26 May 2022. triple-Mu: YOLOv5 python script with CUDA Python API.
    • 23 May 2022. yhpark: Real-ESRGAN, Practical Algorithms for General Image/Video Restoration.
    • 19 May 2022. vjsrinivas: YOLOv3 TRT8 support and Python script.
    • 15 Mar 2022. sky_hole: Swin Transformer - Semantic Segmentation.
    • 19 Oct 2021. liuqi123123 added cuda preprossing for yolov5, preprocessing + inference is 3x faster when batchsize=8.
    • 18 Oct 2021. xupengao: YOLOv5 updated to v6.0, supporting n/s/m/l/x/n6/s6/m6/l6/x6.
    • 31 Aug 2021. FamousDirector: update retinaface to support TensorRT 8.0.
    • 27 Aug 2021. HaiyangPeng: add a python wrapper for hrnet segmentation.
    • 1 Jul 2021. freedenS: DE⫶TR: End-to-End Object Detection with Transformers. First Transformer model!
    • 10 Jun 2021. upczww: EfficientNet b0-b8 and l2.
    • 23 May 2021. SsisyphusTao: CenterNet DLA-34 with DCNv2 plugin.
    • 17 May 2021. ybw108: arcface LResNet100E-IR and MobileFaceNet.
    • 6 May 2021. makaveli10: scaled-yolov4 yolov4-csp.

    教程

    • 安装依赖项.
    • 快速入门课程,用 lenet5 演示.
    • .wts 文件内容格式
    • 常见问题(FAQ)
    • 从TensorRT 4迁移到7
    • 如何使用多个GPU处理, 使用 YOLOv4 作为例子
    • 检查你的 GPU 是否支持 FP16/INT8
    • 如何在 Windows 系统下编译和运行
    • 使用Triton推理服务器部署YOLOv4
    • 从pytorch到trt,以 hrnet 为例(中文)

    测试环境

    1. TensorRT 7.x
    2. TensorRT 8.x(Some of the models support 8.x)

    如何运行

    每个文件夹内部都有一个README,解释如何在其中运行模型。

    模型

    下列模型均被实现.

    NameDescription
    mlpthe very basic model for starters, properly documented
    lenetthe simplest, as a “hello world” of this project
    alexneteasy to implement, all layers are supported in tensorrt
    googlenetGoogLeNet (Inception v1)
    inceptionInception v3, v4
    mnasnetMNASNet with depth multiplier of 0.5 from the paper
    mobilenetMobileNet v2, v3-small, v3-large
    resnetresnet-18, resnet-50 and resnext50-32x4d are implemented
    senetse-resnet50
    shufflenetShuffleNet v2 with 0.5x output channels
    squeezenetSqueezeNet 1.1 model
    vggVGG 11-layer model
    yolov3-tinyweights and pytorch implementation from ultralytics/yolov3
    yolov3darknet-53, weights and pytorch implementation from ultralytics/yolov3
    yolov3-sppdarknet-53, weights and pytorch implementation from ultralytics/yolov3
    yolov4CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3
    yolov5yolov5 v1.0-v6.0, pytorch implementation from ultralytics/yolov5
    retinafaceresnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface
    arcfaceLResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from deepinsight/insightface
    retinafaceAntiCovmobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute
    dbnetScene Text Detection, weights from BaofengZan/DBNet.pytorch
    crnnpytorch implementation from meijieru/crnn.pytorch
    ufldpytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020
    hrnethrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from HRNet-Image-Classification and HRNet-Semantic-Segmentation
    psenetPSENet Text Detection, tensorflow implementation from liuheng92/tensorflow_PSENet
    ibnnetIBN-Net, pytorch implementation from XingangPan/IBN-Net, ECCV2018
    unetU-Net, pytorch implementation from milesial/Pytorch-UNet
    repvggRepVGG, pytorch implementation from DingXiaoH/RepVGG
    lprnetLPRNet, pytorch implementation from xuexingyu24/License_Plate_Detection_Pytorch
    refinedetRefineDet, pytorch implementation from luuuyi/RefineDet.PyTorch
    densenetDenseNet-121, from torchvision.models
    rcnnFasterRCNN and MaskRCNN, model from detectron2
    tsmTSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019
    scaled-yolov4yolov4-csp, pytorch from WongKinYiu/ScaledYOLOv4
    centernetCenterNet DLA-34, pytorch from xingyizhou/CenterNet
    efficientnetEfficientNet b0-b8 and l2, pytorch from lukemelas/EfficientNet-PyTorch
    detrDE⫶TR, pytorch from facebookresearch/detr
    swin-transformerSwin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation is microsoft/Swin-Transformer
    real-esrganReal-ESRGAN. The Pytorch implementation is real-esrgan
    superpointSuperPoint. The Pytorch model is from magicleap/SuperPointPretrainedNetwork

    Model Zoo

    可以从model zoo下载.wts文件以进行快速评估。但建议将.wts从pytorch/mxnet/tensorflow模型转换,以便您可以重新训练自己的模型。

    GoogleDrive | BaiduPan pwd: uvv2

    棘手的操作

    这些模型中遇到的一些棘手操作已经解决,但可能有更好的解决方案。

    NameDescription
    BatchNormImplement by a scale layer, used in resnet, googlenet, mobilenet, etc.
    MaxPool2d(ceil_mode=True)use a padding layer before maxpool to solve ceil_mode=True, see googlenet.
    average pool with paddinguse setAverageCountExcludesPadding() when necessary, see inception.
    relu6use Relu6(x) = Relu(x) - Relu(x-6), see mobilenet.
    torch.chunk()implement the ‘chunk(2, dim=C)’ by tensorrt plugin, see shufflenet.
    channel shuffleuse two shuffle layers to implement channel_shuffle, see shufflenet.
    adaptive pooluse fixed input dimension, and use regular average pooling, see shufflenet.
    leaky reluI wrote a leaky relu plugin, but PRelu in NvInferPlugin.h can be used, see yolov3 in branch trt4.
    yolo layer v1yolo layer is implemented as a plugin, see yolov3 in branch trt4.
    yolo layer v2three yolo layers implemented in one plugin, see yolov3-spp.
    upsamplereplaced by a deconvolution layer, see yolov3.
    hsigmoidhard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3
    retinaface output decodeimplement a plugin to decode bbox, confidence and landmarks, see retinaface.
    mishmish activation is implemented as a plugin, mish is used in yolov4
    prelumxnet’s prelu activation with trainable gamma is implemented as a plugin, used in arcface
    HardSwishhard_swish = x * hard_sigmoid, used in yolov5 v3.0
    LSTMImplemented pytorch nn.LSTM() with tensorrt api

    速度基准

    ModelsDeviceBatchSizeModeInput Shape(HxW)FPS
    YOLOv3-tinyXeon E5-2620/GTX10801FP32608x608333
    YOLOv3(darknet53)Xeon E5-2620/GTX10801FP32608x60839.2
    YOLOv3(darknet53)Xeon E5-2620/GTX10801INT8608x60871.4
    YOLOv3-spp(darknet53)Xeon E5-2620/GTX10801FP32608x60838.5
    YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10801FP32608x60835.7
    YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10804FP32608x60840.9
    YOLOv4(CSPDarknet53)Xeon E5-2620/GTX10808FP32608x60841.3
    YOLOv5-s v3.0Xeon E5-2620/GTX10801FP32608x608142
    YOLOv5-s v3.0Xeon E5-2620/GTX10804FP32608x608173
    YOLOv5-s v3.0Xeon E5-2620/GTX10808FP32608x608190
    YOLOv5-m v3.0Xeon E5-2620/GTX10801FP32608x60871
    YOLOv5-l v3.0Xeon E5-2620/GTX10801FP32608x60843
    YOLOv5-x v3.0Xeon E5-2620/GTX10801FP32608x60829
    YOLOv5-s v4.0Xeon E5-2620/GTX10801FP32608x608142
    YOLOv5-m v4.0Xeon E5-2620/GTX10801FP32608x60871
    YOLOv5-l v4.0Xeon E5-2620/GTX10801FP32608x60840
    YOLOv5-x v4.0Xeon E5-2620/GTX10801FP32608x60827
    RetinaFace(resnet50)Xeon E5-2620/GTX10801FP32480x64090
    RetinaFace(resnet50)Xeon E5-2620/GTX10801INT8480x640204
    RetinaFace(mobilenet0.25)Xeon E5-2620/GTX10801FP32480x640417
    ArcFace(LResNet50E-IR)Xeon E5-2620/GTX10801FP32112x112333
    CRNNXeon E5-2620/GTX10801FP3232x1001000

    需要帮助,如果您获得了速度结果,请添加问题或PR。

    确认和联系

    欢迎任何意见、问题和讨论,请通过以下信息与作者联系。

    E-mail: wangxinyu_es@163.com

    WeChat ID: wangxinyu0375 (可加作者微信进tensorrtx交流群,备注:tensorrtx)

  • 相关阅读:
    CMT2380F32模块开发9-可编程计数阵列 PCA例程
    HAproxy+nginx 搭建负载均衡集群(haproxy日志收集)
    现代企业管理笔记——管理概论
    分布式 | 几步快速拥有读写分离
    一个99%的人都说不清楚知识点——Spring 事务传播行为
    人像分割技术解析与应用
    【脑机接口 算法】EEGNet: 通用神经网络应用于脑电信号
    Pycharm里如何设置多Python文件并行运行
    找不到msvcr120.dll无法执行代码?教你6种方法快速解决问题
    html滑动文章标题置顶
  • 原文地址:https://blog.csdn.net/quicmous/article/details/126750689
  • 最新文章
  • 攻防演习之三天拿下官网站群
    数据安全治理学习——前期安全规划和安全管理体系建设
    企业安全 | 企业内一次钓鱼演练准备过程
    内网渗透测试 | Kerberos协议及其部分攻击手法
    0day的产生 | 不懂代码的"代码审计"
    安装scrcpy-client模块av模块异常,环境问题解决方案
    leetcode hot100【LeetCode 279. 完全平方数】java实现
    OpenWrt下安装Mosquitto
    AnatoMask论文汇总
    【AI日记】24.11.01 LangChain、openai api和github copilot
  • 热门文章
  • 十款代码表白小特效 一个比一个浪漫 赶紧收藏起来吧!!!
    奉劝各位学弟学妹们,该打造你的技术影响力了!
    五年了,我在 CSDN 的两个一百万。
    Java俄罗斯方块,老程序员花了一个周末,连接中学年代!
    面试官都震惊,你这网络基础可以啊!
    你真的会用百度吗?我不信 — 那些不为人知的搜索引擎语法
    心情不好的时候,用 Python 画棵樱花树送给自己吧
    通宵一晚做出来的一款类似CS的第一人称射击游戏Demo!原来做游戏也不是很难,连憨憨学妹都学会了!
    13 万字 C 语言从入门到精通保姆级教程2021 年版
    10行代码集2000张美女图,Python爬虫120例,再上征途
Copyright © 2022 侵权请联系2656653265@qq.com    京ICP备2022015340号-1
正则表达式工具 cron表达式工具 密码生成工具

京公网安备 11010502049817号