Jetson Nano手动编译带CUDA加速的OpenCV及部署含TensorRT加速的YOLOv5_opencv-cuda jetson部署yolo

技术文档

参考文章

Jetson Nano配置YOLOv5并实现FPS=25_jetson nano测fps-CSDN博客文章浏览阅读1.7w次，点赞36次，收藏300次。本文详细介绍了如何在Jetson Nano上配置YOLOv5并利用TensorRT进行加速，以实现25 FPS的实时目标检测。步骤包括更新系统、安装CUDA和PyTorch、搭建YOLOv5环境、修改Nano板显存、编译TensorRTX以及调用USB摄像头进行实时检测。通过这些步骤，可以在Jetson Nano上高效运行YOLOv5模型进行实时视频分析。https://blog.csdn.net/carrymingteng/article/details/120978053Jetson OpenCV 安装，支持cuda加速，已解决多个常见问题_jetson opencv cuda-CSDN博客文章浏览阅读1.2w次，点赞22次，收藏188次。本文详细介绍如何在Jetson Xavier NX平台上卸载默认的OpenCV，并编译安装支持CUDA加速的OpenCV版本。包括解决依赖项安装问题、安装所需依赖库、下载源码、编译和安装步骤等。https://blog.csdn.net/weixin_45306341/article/details/127926178

本人根据实际操作的经验将参考文章的内容整合在一起。凭借操作过后的回忆完成本文，如有错误请评论区指出

1.环境说明

Jetson nano b01，系统版本：Jetpack 4.6.6 [L4T 32.7.6]，python3.6

修改Nano显存

sudo gedit /etc/systemd/nvzramconfig.sh

修改 mem = $(((\"${totalmem}\"/2/\"${NRDEVICES}\")*1024))
为 mem = $(((\"${totalmem}\"*2/\"${NRDEVICES}\")*1024))

重启nano

free -h

可查看到swap已经变为7.7G

2.配置过程

1.安装opencv+cuda加速

彻底删除系统自带的opencv，因为系统自带的opencv不支持cuda加速并且不符合yolov5-6.0的依赖要求。

sudo apt purge libopencv*sudo apt autoremovesudo apt update#之后去系统安装各种程序的文件夹里查找残留，有的话手动删除

安装依赖

sudo apt install -y build-essential checkinstall cmake pkg-config yasm git gfortransudo apt updatesudo apt install -y libgstreamer1.0-dev libgstreamer-plugins-base1.0-devsudo apt install -y libjpeg8-dev libjasper-dev libpng12-dev libtiff5-dev libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine2-dev libv4l-devsudo apt install -y libgstreamer1.0-dev libgstreamer-plugins-base1.0-dev libgtk2.0-dev libtbb-dev libatlas-base-dev libfaac-dev libmp3lame-dev libtheora-dev libvorbis-dev libxvidcore-dev libopencore-amrnb-dev libopencore-amrwb-dev x264 v4l-utilssudo apt install python-dev python-numpy libtbb2 libtbb-dev libjpeg-dev libpng-dev libtiff-dev libjasper-dev libdc1394-22-devsudo apt-get install -y libgtk-3-dev libqt5opengl5-dev libglew-dev

出问题先换源（这里使用的清华源）
若遇到无法定位libjasper-dev软件包，执行如下命令即可安装

方法一

sudo add-apt-repository \"deb http://mirrors.tuna.tsinghua.edu.cn/ubuntu-ports/ xenial main multiverse restricted universe\"sudo apt updatesudo apt install libjasper1 libjasper-dev

方法二

sudo add-apt-repository \"deb http://security.ubuntu.com/ubuntu xenial-security main\"sudo apt updatesudo apt install libjasper1 libjasper-dev

分别前往 OpenCV 和 opencv_contrib 的 github 仓库下载 Latest release 源码
https://github.com/opencv/opencv/releases

https://github.com/opencv/opencv_contrib/releases
注意 OpenCV 和 opencv_contrib 的版本要对应正确，我这里使用OpenCV4.5.5和opencv_contrib-4.5.5

首先解压opencv，之后将opencv_contrib解压到opencv中

cd opencv-4.5.3mkdir buildcd build

之后编译

cmake -DCMAKE_BUILD_TYPE=Release \\-DCMAKE_INSTALL_PREFIX=/usr/local \\-DBUILD_opencv_python3=ON \\-DBUILD_opencv_python2=OFF \\-DWITH_FFMPEG=ON \\-DWITH_CUDA=ON \\-DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \\-DCUDA_ARCH_BIN=5.3 \\-DCUDA_ARCH_PTX=5.3 \\-DENABLE_FAST_MATH=ON \\-DCUDA_FAST_MATH=ON \\-DWITH_CUBLAS=ON \\-DWITH_OPENGL=ON \\-DBUILD_TESTS=OFF \\-DBUILD_PERF_TESTS=OFF \\-DBUILD_EXAMPLES=OFF \\-DOPENCV_EXTRA_MODULES_PATH=../opencv_contrib-4.5.5/modules \\..

因为不需要python2所以设置了 -DBUILD_opencv_python2=OFF \\

因为需要使用摄像头实时处理所以设置 -DWITH_FFMPEG=1 \\

保证路径正确 -DCUDA_TOOLKIT_ROOT_DIR=/usr/local/cuda \\

-DCUDA_ARCH_BIN=5.3 \\ -DCUDA_ARCH_PTX=5.3 \\ 这里的数值通过下面的方法查看

cd /usr/local/cuda/samples/1_Utilities/deviceQuerysudo make./deviceQuery

开启了OpenGL加速 -DWITH_OPENGL=ON \\

查看核心数量，尽量使用多个核心

cat /proc/stat | grep cpu[0-9] -c # 查看线程数

继续编译

make -j4 # 四线程编译sudo make install

安装成功

检测安装结果

在python3中运行

import cv2cv2.__version__

查看cuda加速

jtop

切换到INFO界面可以看到CUDA：YES的字样

至此支持CUDA加速的OpenCV已经安装完成

2.配置CUDA

打开文档

sudo gedit ~/.bashrc

在末尾添加以下内容

export CUDA_HOME=/usr/local/cuda-10.2export LD_LIBRARY_PATH=/usr/local/cuda-10.2/lib64:$LD_LIBRARY_PATHexport PATH=/usr/local/cuda-10.2/bin:$PATH

之后执行

source ~/.bashrcnvcc -V#如果配置成功可以看到CUDA的版本号

3.安装Pytorch（1.10.0-cp36-aarch64）以及对应版本的torchvision（0.11.1）

Pytorch和torchvision的安装包自行寻找

sudo apt-get updatesudo apt-get install python3-pip libopenblas-base libopenmpi-devpip3 install Cythonpip3 install numpypip3 install torch-1.10.0-cp36-cp36m-linux_aarch64.whl#注意要在存放该文件下的位置打开终端并运行sudo apt-get install libjpeg-dev zlib1g-dev libpython3-dev libavcodec-dev libavformat-dev libswscale-dev#将torchvision解压cd torchvision#或者进入到这个文件夹，右键打开终端export BUILD_VERSION=0.11.1python3 setup.py install --user#时间较久#验证torch和torchvision这两个模块是否安装成功python3import torchprint(torch.__version__)#注意version前后都是有两个横杠#如果安装成功会打印出版本号import torchvisionprint(torchvision.__version__)#如果安装成功会打印出版本号

4.搭建yolov5

去yolov5的官网下载yolov5-6.0的版本（受依赖限制，更高版本的使用不了），再把yolov5n.pt下载下来放到yolov5的文件夹里

cd yolov5pip3 install -r requirements.txt #因为之前已经安装过opencv、torch、torchvision，所以可以在requirements中注释掉python3 -m pip list#可查看python中安装的包以下指令可以用来测试yolov5python3 detect.py --source data/images/bus.jpg --weights yolov5n.pt --img 640#图片测试python3 detect.py --source video.mp4 --weights yolov5n.pt --img 640#视频测试,需要自己准备视频python3 detect.py --source 0 --weights yolov5n.pt --img 640#摄像头测试

matplotlib可能安装失败，自己找可用的版本手动安装

其他自动安装不了的包同样可以手动安装，注意版本依赖

如果运行yolov5的detect.py文件时出现 “Illegal instruction（core dumped）”使用下面的方法解决

sudo gedit ~/.bashrc末尾添加export OPENBLAS_CORETYPE=ARMV8保持关闭source ~/.bashrc

5.tensorrtx加速yolov5

去官网下载需要的版本的tensorrtx（tensorrtx-yolov5-v6.0）

https://github.com/wang-xinyu/tensorrtx.git

编译

将下载的tensorrtx项目中的yolov5/gen_wts.py复制到上述的yolov5（注意：不是tensorrtx下的yolov5）下，然后在此处打开终端

python3 gen_wts.py -w yolov5n.pt -o yolov5n.wts#生成wts文件，要先把yolov5n.pt文件放在此处再去执行cd ~/tensorrtx/yolov5/#如果是手动下载的名称可能是tensorrtx-mastermkdir buildcd build将生成的wts文件复制到build下#手动下载的，名称为yolov5-mastercmake ..make -j4sudo ./yolov5 -s yolov5n.wts yolov5n.engine n #生成engine文件sudo ./yolov5 -d yolov5n.engine ../samples/#测试图片查看效果,发现在检测zidane.jpg时漏检，这时可以返回上一层文件夹找到yolov5.cpp中的CONF_THRESH=0.25再进入到build中重新make -j4，再重新运行该指令即可，生成的图片保存在build文件夹中

调用USB摄像头

（1）在tensorrtx/yolov5下备份yolov5.cpp文件，因为如果更换模型时重新推理加速时需要用到该文件。

（2）然后对yolov5.cpp文件修改为如下内容

修改了12行和342行

#include #include #include \"cuda_utils.h\"#include \"logging.h\"#include \"common.hpp\"#include \"utils.h\"#include \"calibrator.h\" #define USE_FP32 // set USE_INT8 or USE_FP16 or USE_FP32#define DEVICE 0 // GPU id#define NMS_THRESH 0.4 //0.4#define CONF_THRESH 0.25//置信度，默认值为0.5，由于效果不好修改为0.25取得了较好的效果#define BATCH_SIZE 1 // stuff we know about the network and the input/output blobsstatic const int INPUT_H = Yolo::INPUT_H;static const int INPUT_W = Yolo::INPUT_W;static const int CLASS_NUM = Yolo::CLASS_NUM;static const int OUTPUT_SIZE = Yolo::MAX_OUTPUT_BBOX_COUNT * sizeof(Yolo::Detection) / sizeof(float) + 1; // we assume the yololayer outputs no more than MAX_OUTPUT_BBOX_COUNT boxes that conf >= 0.1const char* INPUT_BLOB_NAME = \"data\";const char* OUTPUT_BLOB_NAME = \"prob\";static Logger gLogger; char* my_classes[] = { \"person\", \"bicycle\", \"car\", \"motorcycle\", \"airplane\", \"bus\", \"train\", \"truck\", \"boat\", \"traffic light\", \"fire hydrant\", \"stop sign\", \"parking meter\", \"bench\", \"bird\", \"cat\", \"dog\", \"horse\", \"sheep\", \"cow\", \"elephant\", \"bear\", \"zebra\", \"giraffe\", \"backpack\", \"umbrella\", \"handbag\", \"tie\", \"suitcase\", \"frisbee\", \"skis\", \"snowboard\", \"sports ball\", \"kite\", \"baseball bat\", \"baseball glove\", \"skateboard\",\"surfboard\", \"tennis racket\", \"bottle\", \"wine glass\", \"cup\", \"fork\", \"knife\", \"spoon\", \"bowl\", \"banana\", \"apple\", \"sandwich\", \"orange\", \"broccoli\", \"carrot\", \"hot dog\", \"pizza\", \"donut\", \"cake\", \"chair\", \"couch\", \"potted plant\", \"bed\", \"dining table\", \"toilet\", \"tv\", \"laptop\", \"mouse\", \"remote\", \"keyboard\", \"cell phone\", \"microwave\", \"oven\", \"toaster\", \"sink\", \"refrigerator\", \"book\", \"clock\", \"vase\", \"scissors\", \"teddy bear\", \"hair drier\", \"toothbrush\" }; static int get_width(int x, float gw, int divisor = 8) { //return math.ceil(x / divisor) * divisor if (int(x * gw) % divisor == 0) { return int(x * gw); } return (int(x * gw / divisor) + 1) * divisor;} static int get_depth(int x, float gd) { if (x == 1) { return 1; } else { return round(x * gd) > 1 ? round(x * gd) : 1; }} ICudaEngine* build_engine(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) { INetworkDefinition* network = builder->createNetworkV2(0U); // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W }); assert(data); std::map weightMap = loadWeights(wts_name); /* ------ yolov5 backbone------ */ auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, \"model.0\"); auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, \"model.1\"); auto bottleneck_CSP2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, \"model.2\"); auto conv3 = convBlock(network, weightMap, *bottleneck_CSP2->getOutput(0), get_width(256, gw), 3, 2, 1, \"model.3\"); auto bottleneck_csp4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, \"model.4\"); auto conv5 = convBlock(network, weightMap, *bottleneck_csp4->getOutput(0), get_width(512, gw), 3, 2, 1, \"model.5\"); auto bottleneck_csp6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, \"model.6\"); auto conv7 = convBlock(network, weightMap, *bottleneck_csp6->getOutput(0), get_width(1024, gw), 3, 2, 1, \"model.7\"); auto spp8 = SPP(network, weightMap, *conv7->getOutput(0), get_width(1024, gw), get_width(1024, gw), 5, 9, 13, \"model.8\"); /* ------ yolov5 head ------ */ auto bottleneck_csp9 = C3(network, weightMap, *spp8->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, \"model.9\"); auto conv10 = convBlock(network, weightMap, *bottleneck_csp9->getOutput(0), get_width(512, gw), 1, 1, 1, \"model.10\"); auto upsample11 = network->addResize(*conv10->getOutput(0)); assert(upsample11); upsample11->setResizeMode(ResizeMode::kNEAREST); upsample11->setOutputDimensions(bottleneck_csp6->getOutput(0)->getDimensions()); ITensor* inputTensors12[] = { upsample11->getOutput(0), bottleneck_csp6->getOutput(0) }; auto cat12 = network->addConcatenation(inputTensors12, 2); auto bottleneck_csp13 = C3(network, weightMap, *cat12->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, \"model.13\"); auto conv14 = convBlock(network, weightMap, *bottleneck_csp13->getOutput(0), get_width(256, gw), 1, 1, 1, \"model.14\"); auto upsample15 = network->addResize(*conv14->getOutput(0)); assert(upsample15); upsample15->setResizeMode(ResizeMode::kNEAREST); upsample15->setOutputDimensions(bottleneck_csp4->getOutput(0)->getDimensions()); ITensor* inputTensors16[] = { upsample15->getOutput(0), bottleneck_csp4->getOutput(0) }; auto cat16 = network->addConcatenation(inputTensors16, 2); auto bottleneck_csp17 = C3(network, weightMap, *cat16->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, \"model.17\"); // yolo layer 0 IConvolutionLayer* det0 = network->addConvolutionNd(*bottleneck_csp17->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap[\"model.24.m.0.weight\"], weightMap[\"model.24.m.0.bias\"]); auto conv18 = convBlock(network, weightMap, *bottleneck_csp17->getOutput(0), get_width(256, gw), 3, 2, 1, \"model.18\"); ITensor* inputTensors19[] = { conv18->getOutput(0), conv14->getOutput(0) }; auto cat19 = network->addConcatenation(inputTensors19, 2); auto bottleneck_csp20 = C3(network, weightMap, *cat19->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, \"model.20\"); //yolo layer 1 IConvolutionLayer* det1 = network->addConvolutionNd(*bottleneck_csp20->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap[\"model.24.m.1.weight\"], weightMap[\"model.24.m.1.bias\"]); auto conv21 = convBlock(network, weightMap, *bottleneck_csp20->getOutput(0), get_width(512, gw), 3, 2, 1, \"model.21\"); ITensor* inputTensors22[] = { conv21->getOutput(0), conv10->getOutput(0) }; auto cat22 = network->addConcatenation(inputTensors22, 2); auto bottleneck_csp23 = C3(network, weightMap, *cat22->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, \"model.23\"); IConvolutionLayer* det2 = network->addConvolutionNd(*bottleneck_csp23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap[\"model.24.m.2.weight\"], weightMap[\"model.24.m.2.bias\"]); auto yolo = addYoLoLayer(network, weightMap, \"model.24\", std::vector{det0, det1, det2}); yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME); network->markOutput(*yolo->getOutput(0)); // Build engine builder->setMaxBatchSize(maxBatchSize); config->setMaxWorkspaceSize(16 * (1 <setFlag(BuilderFlag::kFP16);#elif defined(USE_INT8) std::cout << \"Your platform support int8: \" <platformHasFastInt8() ? \"true\" : \"false\") <platformHasFastInt8()); config->setFlag(BuilderFlag::kINT8); Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, \"./coco_calib/\", \"int8calib.table\", INPUT_BLOB_NAME); config->setInt8Calibrator(calibrator);#endif std::cout << \"Building engine, please wait for a while...\" <buildEngineWithConfig(*network, *config); std::cout << \"Build engine successfully!\" <destroy(); // Release host memory for (auto& mem : weightMap) { free((void*)(mem.second.values)); } return engine;} ICudaEngine* build_engine_p6(unsigned int maxBatchSize, IBuilder* builder, IBuilderConfig* config, DataType dt, float& gd, float& gw, std::string& wts_name) { INetworkDefinition* network = builder->createNetworkV2(0U); // Create input tensor of shape {3, INPUT_H, INPUT_W} with name INPUT_BLOB_NAME ITensor* data = network->addInput(INPUT_BLOB_NAME, dt, Dims3{ 3, INPUT_H, INPUT_W }); assert(data); std::map weightMap = loadWeights(wts_name); /* ------ yolov5 backbone------ */ auto focus0 = focus(network, weightMap, *data, 3, get_width(64, gw), 3, \"model.0\"); auto conv1 = convBlock(network, weightMap, *focus0->getOutput(0), get_width(128, gw), 3, 2, 1, \"model.1\"); auto c3_2 = C3(network, weightMap, *conv1->getOutput(0), get_width(128, gw), get_width(128, gw), get_depth(3, gd), true, 1, 0.5, \"model.2\"); auto conv3 = convBlock(network, weightMap, *c3_2->getOutput(0), get_width(256, gw), 3, 2, 1, \"model.3\"); auto c3_4 = C3(network, weightMap, *conv3->getOutput(0), get_width(256, gw), get_width(256, gw), get_depth(9, gd), true, 1, 0.5, \"model.4\"); auto conv5 = convBlock(network, weightMap, *c3_4->getOutput(0), get_width(512, gw), 3, 2, 1, \"model.5\"); auto c3_6 = C3(network, weightMap, *conv5->getOutput(0), get_width(512, gw), get_width(512, gw), get_depth(9, gd), true, 1, 0.5, \"model.6\"); auto conv7 = convBlock(network, weightMap, *c3_6->getOutput(0), get_width(768, gw), 3, 2, 1, \"model.7\"); auto c3_8 = C3(network, weightMap, *conv7->getOutput(0), get_width(768, gw), get_width(768, gw), get_depth(3, gd), true, 1, 0.5, \"model.8\"); auto conv9 = convBlock(network, weightMap, *c3_8->getOutput(0), get_width(1024, gw), 3, 2, 1, \"model.9\"); auto spp10 = SPP(network, weightMap, *conv9->getOutput(0), get_width(1024, gw), get_width(1024, gw), 3, 5, 7, \"model.10\"); auto c3_11 = C3(network, weightMap, *spp10->getOutput(0), get_width(1024, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, \"model.11\"); /* ------ yolov5 head ------ */ auto conv12 = convBlock(network, weightMap, *c3_11->getOutput(0), get_width(768, gw), 1, 1, 1, \"model.12\"); auto upsample13 = network->addResize(*conv12->getOutput(0)); assert(upsample13); upsample13->setResizeMode(ResizeMode::kNEAREST); upsample13->setOutputDimensions(c3_8->getOutput(0)->getDimensions()); ITensor* inputTensors14[] = { upsample13->getOutput(0), c3_8->getOutput(0) }; auto cat14 = network->addConcatenation(inputTensors14, 2); auto c3_15 = C3(network, weightMap, *cat14->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, \"model.15\"); auto conv16 = convBlock(network, weightMap, *c3_15->getOutput(0), get_width(512, gw), 1, 1, 1, \"model.16\"); auto upsample17 = network->addResize(*conv16->getOutput(0)); assert(upsample17); upsample17->setResizeMode(ResizeMode::kNEAREST); upsample17->setOutputDimensions(c3_6->getOutput(0)->getDimensions()); ITensor* inputTensors18[] = { upsample17->getOutput(0), c3_6->getOutput(0) }; auto cat18 = network->addConcatenation(inputTensors18, 2); auto c3_19 = C3(network, weightMap, *cat18->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, \"model.19\"); auto conv20 = convBlock(network, weightMap, *c3_19->getOutput(0), get_width(256, gw), 1, 1, 1, \"model.20\"); auto upsample21 = network->addResize(*conv20->getOutput(0)); assert(upsample21); upsample21->setResizeMode(ResizeMode::kNEAREST); upsample21->setOutputDimensions(c3_4->getOutput(0)->getDimensions()); ITensor* inputTensors21[] = { upsample21->getOutput(0), c3_4->getOutput(0) }; auto cat22 = network->addConcatenation(inputTensors21, 2); auto c3_23 = C3(network, weightMap, *cat22->getOutput(0), get_width(512, gw), get_width(256, gw), get_depth(3, gd), false, 1, 0.5, \"model.23\"); auto conv24 = convBlock(network, weightMap, *c3_23->getOutput(0), get_width(256, gw), 3, 2, 1, \"model.24\"); ITensor* inputTensors25[] = { conv24->getOutput(0), conv20->getOutput(0) }; auto cat25 = network->addConcatenation(inputTensors25, 2); auto c3_26 = C3(network, weightMap, *cat25->getOutput(0), get_width(1024, gw), get_width(512, gw), get_depth(3, gd), false, 1, 0.5, \"model.26\"); auto conv27 = convBlock(network, weightMap, *c3_26->getOutput(0), get_width(512, gw), 3, 2, 1, \"model.27\"); ITensor* inputTensors28[] = { conv27->getOutput(0), conv16->getOutput(0) }; auto cat28 = network->addConcatenation(inputTensors28, 2); auto c3_29 = C3(network, weightMap, *cat28->getOutput(0), get_width(1536, gw), get_width(768, gw), get_depth(3, gd), false, 1, 0.5, \"model.29\"); auto conv30 = convBlock(network, weightMap, *c3_29->getOutput(0), get_width(768, gw), 3, 2, 1, \"model.30\"); ITensor* inputTensors31[] = { conv30->getOutput(0), conv12->getOutput(0) }; auto cat31 = network->addConcatenation(inputTensors31, 2); auto c3_32 = C3(network, weightMap, *cat31->getOutput(0), get_width(2048, gw), get_width(1024, gw), get_depth(3, gd), false, 1, 0.5, \"model.32\"); /* ------ detect ------ */ IConvolutionLayer* det0 = network->addConvolutionNd(*c3_23->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap[\"model.33.m.0.weight\"], weightMap[\"model.33.m.0.bias\"]); IConvolutionLayer* det1 = network->addConvolutionNd(*c3_26->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap[\"model.33.m.1.weight\"], weightMap[\"model.33.m.1.bias\"]); IConvolutionLayer* det2 = network->addConvolutionNd(*c3_29->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap[\"model.33.m.2.weight\"], weightMap[\"model.33.m.2.bias\"]); IConvolutionLayer* det3 = network->addConvolutionNd(*c3_32->getOutput(0), 3 * (Yolo::CLASS_NUM + 5), DimsHW{ 1, 1 }, weightMap[\"model.33.m.3.weight\"], weightMap[\"model.33.m.3.bias\"]); auto yolo = addYoLoLayer(network, weightMap, \"model.33\", std::vector{det0, det1, det2, det3}); yolo->getOutput(0)->setName(OUTPUT_BLOB_NAME); network->markOutput(*yolo->getOutput(0)); // Build engine builder->setMaxBatchSize(maxBatchSize); config->setMaxWorkspaceSize(16 * (1 <setFlag(BuilderFlag::kFP16);#elif defined(USE_INT8) std::cout << \"Your platform support int8: \" <platformHasFastInt8() ? \"true\" : \"false\") <platformHasFastInt8()); config->setFlag(BuilderFlag::kINT8); Int8EntropyCalibrator2* calibrator = new Int8EntropyCalibrator2(1, INPUT_W, INPUT_H, \"./coco_calib/\", \"int8calib.table\", INPUT_BLOB_NAME); config->setInt8Calibrator(calibrator);#endif std::cout << \"Building engine, please wait for a while...\" <buildEngineWithConfig(*network, *config); std::cout << \"Build engine successfully!\" <destroy(); // Release host memory for (auto& mem : weightMap) { free((void*)(mem.second.values)); } return engine;} void APIToModel(unsigned int maxBatchSize, IHostMemory** modelStream, float& gd, float& gw, std::string& wts_name) { // Create builder IBuilder* builder = createInferBuilder(gLogger); IBuilderConfig* config = builder->createBuilderConfig(); // Create model to populate the network, then set the outputs and create an engine ICudaEngine* engine = build_engine(maxBatchSize, builder, config, DataType::kFLOAT, gd, gw, wts_name); assert(engine != nullptr); // Serialize the engine (*modelStream) = engine->serialize(); // Close everything down engine->destroy(); builder->destroy(); config->destroy();} void doInference(IExecutionContext& context, cudaStream_t& stream, void** buffers, float* input, float* output, int batchSize) { // DMA input batch data to device, infer on the batch asynchronously, and DMA output back to host CUDA_CHECK(cudaMemcpyAsync(buffers[0], input, batchSize * 3 * INPUT_H * INPUT_W * sizeof(float), cudaMemcpyHostToDevice, stream)); context.enqueue(batchSize, buffers, stream, nullptr); CUDA_CHECK(cudaMemcpyAsync(output, buffers[1], batchSize * OUTPUT_SIZE * sizeof(float), cudaMemcpyDeviceToHost, stream)); cudaStreamSynchronize(stream);} bool parse_args(int argc, char** argv, std::string& engine) { if (argc < 3) return false; if (std::string(argv[1]) == \"-v\" && argc == 3) { engine = std::string(argv[2]); } else { return false; } return true;} int main(int argc, char** argv) { cudaSetDevice(DEVICE); //std::string wts_name = \"\"; std::string engine_name = \"\"; //float gd = 0.0f, gw = 0.0f; //std::string img_dir; if (!parse_args(argc, argv, engine_name)) { std::cerr << \"arguments not right!\" << std::endl; std::cerr << \"./yolov5 -v [.engine] // run inference with camera\" << std::endl; return -1; } std::ifstream file(engine_name, std::ios::binary); if (!file.good()) { std::cerr << \" read \" << engine_name << \" error! \" << std::endl; return -1; } char* trtModelStream{ nullptr }; size_t size = 0; file.seekg(0, file.end); size = file.tellg(); file.seekg(0, file.beg); trtModelStream = new char[size]; assert(trtModelStream); file.read(trtModelStream, size); file.close(); // prepare input data --------------------------- static float data[BATCH_SIZE * 3 * INPUT_H * INPUT_W]; //for (int i = 0; i deserializeCudaEngine(trtModelStream, size); assert(engine != nullptr); IExecutionContext* context = engine->createExecutionContext(); assert(context != nullptr); delete[] trtModelStream; assert(engine->getNbBindings() == 2); void* buffers[2]; // In order to bind the buffers, we need to know the names of the input and output tensors. // Note that indices are guaranteed to be less than IEngine::getNbBindings() const int inputIndex = engine->getBindingIndex(INPUT_BLOB_NAME); const int outputIndex = engine->getBindingIndex(OUTPUT_BLOB_NAME); assert(inputIndex == 0); assert(outputIndex == 1); // Create GPU buffers on device CUDA_CHECK(cudaMalloc(&buffers[inputIndex], BATCH_SIZE * 3 * INPUT_H * INPUT_W * sizeof(float))); CUDA_CHECK(cudaMalloc(&buffers[outputIndex], BATCH_SIZE * OUTPUT_SIZE * sizeof(float))); // Create stream cudaStream_t stream; CUDA_CHECK(cudaStreamCreate(&stream)); cv::VideoCapture capture(\"/home/cao-yolox/yolov5/tensorrtx-master/yolov5/samples/1.mp4\");#修改为自己要检测的视频或者图片，注意要写全路径，如果调用摄像头，则括号内的参数设为0，注意引号要去掉。 //cv::VideoCapture capture(\"../overpass.mp4\"); //int fourcc = cv::VideoWriter::fourcc(\'M\',\'J\',\'P\',\'G\'); //capture.set(cv::CAP_PROP_FOURCC, fourcc); if (!capture.isOpened()) { std::cout << \"Error opening video stream or file\" <> frame; if (frame.empty()) { std::cout << \"Fail to read image from camera!\" << std::endl; break; } fcount++; //if (fcount < BATCH_SIZE && f + 1 != (int)file_names.size()) continue; for (int b = 0; b < fcount; b++) { //cv::Mat img = cv::imread(img_dir + \"/\" + file_names[f - fcount + 1 + b]); cv::Mat img = frame; if (img.empty()) continue; cv::Mat pr_img = preprocess_img(img, INPUT_W, INPUT_H); // letterbox BGR to RGB int i = 0; for (int row = 0; row < INPUT_H; ++row) { uchar* uc_pixel = pr_img.data + row * pr_img.step; for (int col = 0; col < INPUT_W; ++col) {  data[b * 3 * INPUT_H * INPUT_W + i] = (float)uc_pixel[2] / 255.0;  data[b * 3 * INPUT_H * INPUT_W + i + INPUT_H * INPUT_W] = (float)uc_pixel[1] / 255.0;  data[b * 3 * INPUT_H * INPUT_W + i + 2 * INPUT_H * INPUT_W] = (float)uc_pixel[0] / 255.0;  uc_pixel += 3;  ++i; } } } // Run inference auto start = std::chrono::system_clock::now(); doInference(*context, stream, buffers, data, prob, BATCH_SIZE); auto end = std::chrono::system_clock::now(); //std::cout << std::chrono::duration_cast(end - start).count() << \"ms\" << std::endl; int fps = 1000.0 / std::chrono::duration_cast(end - start).count(); std::vector<std::vector> batch_res(fcount); for (int b = 0; b < fcount; b++) { auto& res = batch_res[b]; nms(res, &prob[b * OUTPUT_SIZE], CONF_THRESH, NMS_THRESH); } for (int b = 0; b < fcount; b++) { auto& res = batch_res[b]; //std::cout << res.size() << std::endl; //cv::Mat img = cv::imread(img_dir + \"/\" + file_names[f - fcount + 1 + b]); for (size_t j = 0; j destroy(); engine->destroy(); runtime->destroy(); return 0;}

重新编译

进入到buid下重新make。注意只要修改了yolov5.cpp就要重新make。
执行

sudo ./yolov5 -v yolov5n.engine#注意要提前插好摄像头

问题：出现Failed to load module “canberra-gtk-module”
解决：

sudo apt-get install libcanberra-gtk-module

6.使用自己训练的模型在tensorrtx加速下运行yolo

重复第五步的步骤，生成wts文件，移动到build文件夹，使用官方提供的yolov5.cpp编译，生成engine文件之后可以再将yolov5.cpp换成上文的程序，这样才可以对视频或摄像头画面进行处理。

Jetson Nano手动编译带CUDA加速的OpenCV及部署含TensorRT加速的YOLOv5_opencv-cuda jetson部署yolo

参考文章

本人根据实际操作的经验将参考文章的内容整合在一起。凭借操作过后的回忆完成本文，如有错误请评论区指出

1.环境说明

2.配置过程

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

Jetson Nano手动编译带CUDA加速的OpenCV及部署含TensorRT加速的YOLOv5_opencv-cuda jetson部署yolo

参考文章

本人根据实际操作的经验将参考文章的内容整合在一起。凭借操作过后的回忆完成本文，如有错误请评论区指出

1.环境说明

2.配置过程

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签