我的NVIDIA开发者之旅 - 极智AI | python and cpp 实现 TensorRT elementWise Layer
"我的NVIDIA开发者之旅” | 征文活动进行中…
欢迎关注我的公众号 [极智视界],获取我的更多笔记分享
大家好,我是极智视界,本文介绍一下 python 和 cpp 实现 TensorRT elementWise 层。
elementWise 算子指的是需要逐位运行的 op,具有十分丰富的元素间计算,如元素加、元素点乘、元素减、取极值等。这里结合 TensorRT 的实现来说,主要包括 python 实现 和 cpp 实现。
文章目录
-
- 1 elementWise Layer python TensorRT 构建
- 2 elementWise Layer cpp TensorRT 构建
1 elementWise Layer python TensorRT 构建
来看接口:
elementWise_Layer = network.add_elementwise(input0, input1, trt.ElementWiseOperation)
前两个传参比较好理解,就是输出操作的两个张量。第三个传参是 elementWise 的具体操作方式,这个可供选择的方式十分丰富,如下:
下面用一个示例代码进行 python elementWise Layer 的 TensorRT 搭建:
import numpy as npfrom cuda import cudartimport tensorrt as trtnIn, cIn, hIn, wIn = 1, 3, 4, 5 # 输入张量 NCHWdata0 = np.full([nIn, cIn, hIn, wIn], 1, dtype=np.float32).reshape(nIn, cIn, hIn, wIn) # 输入数据data1 = np.full([nIn, cIn, hIn, wIn], 2, dtype=np.float32).reshape(nIn, cIn, hIn, wIn)np.set_printoptions(precision=8, linewidth=200, suppress=True)cudart.cudaDeviceSynchronize()logger = trt.Logger(trt.Logger.ERROR) # 构建 loggerbuilder = trt.Builder(logger) # 构建 buildernetwork = builder.create_network(1 << int(trt.NetworkDefinitionCreationFlag.EXPLICIT_BATCH)) # 构建 networkconfig = builder.create_builder_config() # 构建 configinputT0 = network.add_input('input0', trt.DataType.FLOAT, (nIn, cIn, hIn, wIn))inputT1 = network.add_input('input1', trt.DataType.FLOAT, (nIn, cIn, hIn, wIn))elementwiseLayer = network.add_elementwise(input0, input1, trt.ElementWiseOperation.SUM) # 添加elementwise 算子network.mark_output(elementwiseLayer.get_output(0)) # 设置输出engineString = builder.build_serialized_network(network, config)engine = trt.Runtime(logger).deserialize_cuda_engine(engineString) # 反序列化context = engine.create_execution_context() # 构建 context_, stream = cudart.cudaStreamCreate()inputH0 = np.ascontiguousarray(data0.reshape(-1))inputH1 = np.ascontiguousarray(data1.reshape(-1))outputH0 = np.empty(context.get_binding_shape(2), dtype=trt.nptype(engine.get_binding_dtype(2)))_, inputD0 = cudart.cudaMallocAsync(inputH0.nbytes, stream)_, inputD1 = cudart.cudaMallocAsync(inputH1.nbytes, stream)_, outputD0 = cudart.cudaMallocAsync(outputH0.nbytes, stream)cudart.cudaMemcpyAsync(inputD0, inputH0.ctypes.data, inputH0.nbytes, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream)cudart.cudaMemcpyAsync(inputD1, inputH1.ctypes.data, inputH1.nbytes, cudart.cudaMemcpyKind.cudaMemcpyHostToDevice, stream)context.execute_async_v2([int(inputD0), int(inputD1), int(outputD0)], stream)cudart.cudaMemcpyAsync(outputH0.ctypes.data, outputD0, outputH0.nbytes, cudart.cudaMemcpyKind.cudaMemcpyDeviceToHost, stream)cudart.cudaStreamSynchronize(stream)cudart.cudaStreamDestroy(stream)cudart.cudaFree(inputD0)cudart.cudaFree(outputD0)
以上输入张量 shape 为 (1, 3, 4, 5):
两个输入张量进行逐元素相加,输出张量 shape 也为 (1, 3, 4, 5):
2 elementWise Layer cpp TensorRT 构建
接下来看 cpp 的实现,首先看接口:
//!//! \brief Add an elementwise layer to the network.//!//! \param input1 The first input tensor to the layer.//! \param input2 The second input tensor to the layer.//! \param op The binary operation that the layer applies.//!//! The input tensors must have the same rank.//! For each dimension, their lengths must match, or one of them must be one.//! In the latter case, the tensor is broadcast along that axis.//!//! The output tensor has the same rank as the inputs.//! For each dimension, its length is the maximum of the lengths of the//! corresponding input dimension.//!//! \see IElementWiseLayer//! \warning For shape tensors, ElementWiseOperation::kPOW is not a valid op.//!//! \return The new elementwise layer, or nullptr if it could not be created.//!IElementWiseLayer* addElementWise(ITensor& input1, ITensor& input2, ElementWiseOperation op) noexcept{ return mImpl->addElementWise(input1, input2, op);}
方法和传参等都可与 python 对应起来,不多说,那在 cpp 里怎么进行 elementWise Layer 的构建呢?看下面:
auto mode = ElementWiseOperation::kSUM;if (eleMode == "SUM") { // mode 选择 mode = ElementWiseOperation::kSUM;}else if (eleMode == "PROD") { mode = ElementWiseOperation::kPROD;}else if (eleMode == "MAX") { mode = ElementWiseOperation::kMAX;}else if (eleMode == "MIN") { mode = ElementWiseOperation::kMIN;}else if(eleMode == "SUB") { mode = ElementWiseOperation::kSUB;}else if (eleMode == "POW") { mode = ElementWiseOperation::kPOW;}else if (eleMode == "FLOOR_DIV") { mode = ElementWiseOperation::kFLOOR_DIV;}else if (eleMode == "AND") { mode = ElementWiseOperation::kAND;}else if (eleMode == "OR") { mode = ElementWiseOperation::kOR;}else if (eleMode == "XOR") { mode = ElementWiseOperation::kXOR;}else if (eleMode == "EQUAL") { mode = ElementWiseOperation::kEQUAL;}else if (eleMode == "GREATER") { mode = ElementWiseOperation::kGREATER;}else if (eleMode == "LESS") { mode = ElementWiseOperation::kLESS;}// elementWise Layer 构建auto elementWise_Layer = m_network->addElementWise(*Layers[input0], *Layers[input1], mode);// elementWise Layer 输出设置Layers[layerName] = elementWise_Layer->getOutput(0);
很简单,以上就完成了 elementWise Layer cpp TensorRT 的构建。这里 cpp 里的搭建只展示了一个层的构建,没有 python 示例来的完整,要看整网构建的话可以参考 python 代码。
好了,以上分享了 python 和 cpp 实现 TensorRT elementWise Layer 的方法。希望我的分享能对你的学习有一点帮助。
【公众号传送】
《极智AI | python and cpp 实现 TensorRT elementWise Layer》
扫描下方二维码即可关注我的微信公众号【极智视界】,获取我的更多经验分享,让我们用极致+极客的心态来迎接AI !