TensorRTx 提供了把常见网络模型转化为 TensorRT 格式的功能。TensorRTx旨在使用tensorrt网络定义API实现流行的深度学习网络。tensorrt有内置的解析器,包括caffeparser、uffparser、onnxparser等,当我们使用这些解析器时,我们经常遇到一些“不受支持的操作或层”问题,特别是一些最先进的模型正在使用新类型的层。
那么我们为什么不跳过所有的解析器呢?我们只使用TensorRT网络定义API来构建整个网络,并不复杂。
TensorRTx 所有模型首先在pytorch/mxnet/tensorflown中实现,然后导出权重文件xxx.wts,然后使用tensorrt加载权重,定义网络并进行推理。一些pytorch实现可以在my repo Pytorchx中找到,其余的来自polular开源实现。
19 Aug 2022. Dominic and sbmalik: Yolov3-tiny and Arcface support TRT8.6 Jul 2022. xiang-wuu: SuperPoint - Self-Supervised Interest Point Detection and Description, vSLAM related.26 May 2022. triple-Mu: YOLOv5 python script with CUDA Python API.23 May 2022. yhpark: Real-ESRGAN, Practical Algorithms for General Image/Video Restoration.19 May 2022. vjsrinivas: YOLOv3 TRT8 support and Python script.15 Mar 2022. sky_hole: Swin Transformer - Semantic Segmentation.19 Oct 2021. liuqi123123 added cuda preprossing for yolov5, preprocessing + inference is 3x faster when batchsize=8.18 Oct 2021. xupengao: YOLOv5 updated to v6.0, supporting n/s/m/l/x/n6/s6/m6/l6/x6.31 Aug 2021. FamousDirector: update retinaface to support TensorRT 8.0.27 Aug 2021. HaiyangPeng: add a python wrapper for hrnet segmentation.1 Jul 2021. freedenS: DE⫶TR: End-to-End Object Detection with Transformers. First Transformer model!10 Jun 2021. upczww: EfficientNet b0-b8 and l2.23 May 2021. SsisyphusTao: CenterNet DLA-34 with DCNv2 plugin.17 May 2021. ybw108: arcface LResNet100E-IR and MobileFaceNet.6 May 2021. makaveli10: scaled-yolov4 yolov4-csp.每个文件夹内部都有一个README,解释如何在其中运行模型。
下列模型均被实现.
| Name | Description |
|---|---|
| mlp | the very basic model for starters, properly documented |
| lenet | the simplest, as a “hello world” of this project |
| alexnet | easy to implement, all layers are supported in tensorrt |
| googlenet | GoogLeNet (Inception v1) |
| inception | Inception v3, v4 |
| mnasnet | MNASNet with depth multiplier of 0.5 from the paper |
| mobilenet | MobileNet v2, v3-small, v3-large |
| resnet | resnet-18, resnet-50 and resnext50-32x4d are implemented |
| senet | se-resnet50 |
| shufflenet | ShuffleNet v2 with 0.5x output channels |
| squeezenet | SqueezeNet 1.1 model |
| vgg | VGG 11-layer model |
| yolov3-tiny | weights and pytorch implementation from ultralytics/yolov3 |
| yolov3 | darknet-53, weights and pytorch implementation from ultralytics/yolov3 |
| yolov3-spp | darknet-53, weights and pytorch implementation from ultralytics/yolov3 |
| yolov4 | CSPDarknet53, weights from AlexeyAB/darknet, pytorch implementation from ultralytics/yolov3 |
| yolov5 | yolov5 v1.0-v6.0, pytorch implementation from ultralytics/yolov5 |
| retinaface | resnet50 and mobilnet0.25, weights from biubug6/Pytorch_Retinaface |
| arcface | LResNet50E-IR, LResNet100E-IR and MobileFaceNet, weights from deepinsight/insightface |
| retinafaceAntiCov | mobilenet0.25, weights from deepinsight/insightface, retinaface anti-COVID-19, detect face and mask attribute |
| dbnet | Scene Text Detection, weights from BaofengZan/DBNet.pytorch |
| crnn | pytorch implementation from meijieru/crnn.pytorch |
| ufld | pytorch implementation from Ultra-Fast-Lane-Detection, ECCV2020 |
| hrnet | hrnet-image-classification and hrnet-semantic-segmentation, pytorch implementation from HRNet-Image-Classification and HRNet-Semantic-Segmentation |
| psenet | PSENet Text Detection, tensorflow implementation from liuheng92/tensorflow_PSENet |
| ibnnet | IBN-Net, pytorch implementation from XingangPan/IBN-Net, ECCV2018 |
| unet | U-Net, pytorch implementation from milesial/Pytorch-UNet |
| repvgg | RepVGG, pytorch implementation from DingXiaoH/RepVGG |
| lprnet | LPRNet, pytorch implementation from xuexingyu24/License_Plate_Detection_Pytorch |
| refinedet | RefineDet, pytorch implementation from luuuyi/RefineDet.PyTorch |
| densenet | DenseNet-121, from torchvision.models |
| rcnn | FasterRCNN and MaskRCNN, model from detectron2 |
| tsm | TSM: Temporal Shift Module for Efficient Video Understanding, ICCV2019 |
| scaled-yolov4 | yolov4-csp, pytorch from WongKinYiu/ScaledYOLOv4 |
| centernet | CenterNet DLA-34, pytorch from xingyizhou/CenterNet |
| efficientnet | EfficientNet b0-b8 and l2, pytorch from lukemelas/EfficientNet-PyTorch |
| detr | DE⫶TR, pytorch from facebookresearch/detr |
| swin-transformer | Swin Transformer - Semantic Segmentation, only support Swin-T. The Pytorch implementation is microsoft/Swin-Transformer |
| real-esrgan | Real-ESRGAN. The Pytorch implementation is real-esrgan |
| superpoint | SuperPoint. The Pytorch model is from magicleap/SuperPointPretrainedNetwork |
可以从model zoo下载.wts文件以进行快速评估。但建议将.wts从pytorch/mxnet/tensorflow模型转换,以便您可以重新训练自己的模型。
GoogleDrive | BaiduPan pwd: uvv2
这些模型中遇到的一些棘手操作已经解决,但可能有更好的解决方案。
| Name | Description |
|---|---|
| BatchNorm | Implement by a scale layer, used in resnet, googlenet, mobilenet, etc. |
| MaxPool2d(ceil_mode=True) | use a padding layer before maxpool to solve ceil_mode=True, see googlenet. |
| average pool with padding | use setAverageCountExcludesPadding() when necessary, see inception. |
| relu6 | use Relu6(x) = Relu(x) - Relu(x-6), see mobilenet. |
| torch.chunk() | implement the ‘chunk(2, dim=C)’ by tensorrt plugin, see shufflenet. |
| channel shuffle | use two shuffle layers to implement channel_shuffle, see shufflenet. |
| adaptive pool | use fixed input dimension, and use regular average pooling, see shufflenet. |
| leaky relu | I wrote a leaky relu plugin, but PRelu in NvInferPlugin.h can be used, see yolov3 in branch trt4. |
| yolo layer v1 | yolo layer is implemented as a plugin, see yolov3 in branch trt4. |
| yolo layer v2 | three yolo layers implemented in one plugin, see yolov3-spp. |
| upsample | replaced by a deconvolution layer, see yolov3. |
| hsigmoid | hard sigmoid is implemented as a plugin, hsigmoid and hswish are used in mobilenetv3 |
| retinaface output decode | implement a plugin to decode bbox, confidence and landmarks, see retinaface. |
| mish | mish activation is implemented as a plugin, mish is used in yolov4 |
| prelu | mxnet’s prelu activation with trainable gamma is implemented as a plugin, used in arcface |
| HardSwish | hard_swish = x * hard_sigmoid, used in yolov5 v3.0 |
| LSTM | Implemented pytorch nn.LSTM() with tensorrt api |
| Models | Device | BatchSize | Mode | Input Shape(HxW) | FPS |
|---|---|---|---|---|---|
| YOLOv3-tiny | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 333 |
| YOLOv3(darknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 39.2 |
| YOLOv3(darknet53) | Xeon E5-2620/GTX1080 | 1 | INT8 | 608x608 | 71.4 |
| YOLOv3-spp(darknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 38.5 |
| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 35.7 |
| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 4 | FP32 | 608x608 | 40.9 |
| YOLOv4(CSPDarknet53) | Xeon E5-2620/GTX1080 | 8 | FP32 | 608x608 | 41.3 |
| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 142 |
| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 4 | FP32 | 608x608 | 173 |
| YOLOv5-s v3.0 | Xeon E5-2620/GTX1080 | 8 | FP32 | 608x608 | 190 |
| YOLOv5-m v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 71 |
| YOLOv5-l v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 43 |
| YOLOv5-x v3.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 29 |
| YOLOv5-s v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 142 |
| YOLOv5-m v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 71 |
| YOLOv5-l v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 40 |
| YOLOv5-x v4.0 | Xeon E5-2620/GTX1080 | 1 | FP32 | 608x608 | 27 |
| RetinaFace(resnet50) | Xeon E5-2620/GTX1080 | 1 | FP32 | 480x640 | 90 |
| RetinaFace(resnet50) | Xeon E5-2620/GTX1080 | 1 | INT8 | 480x640 | 204 |
| RetinaFace(mobilenet0.25) | Xeon E5-2620/GTX1080 | 1 | FP32 | 480x640 | 417 |
| ArcFace(LResNet50E-IR) | Xeon E5-2620/GTX1080 | 1 | FP32 | 112x112 | 333 |
| CRNN | Xeon E5-2620/GTX1080 | 1 | FP32 | 32x100 | 1000 |
需要帮助,如果您获得了速度结果,请添加问题或PR。
欢迎任何意见、问题和讨论,请通过以下信息与作者联系。
E-mail: wangxinyu_es@163.com
WeChat ID: wangxinyu0375 (可加作者微信进tensorrtx交流群,备注:tensorrtx)