> 文档中心 > 如何调用一个具有动态维度的tensorrt engine

如何调用一个具有动态维度的tensorrt engine

调用具体动态维度模型engine时如果没有指定维度,会导致报类似这样的错误:

[TRT] Parameter check failed at: engine.cpp::resolveslots::1227, condition: allInputDimensionsSpecified(routine)

在python代码里,在调用engine推理前做这样的设置即可:

context.set_binding_shape(0, (BATCH, 3, INPUT_H, INPUT_W))

在C++代码里改如何设置,很少有文章提及,关于如何调用有动态维度的模型,一般都是举的python代码的例子,我查了一下TensorRT的头文件NvInferRuntime.h里的代码才知道,C++代码里应该调用IExecutionContext类型的实例的setBindingDimensions(int bindingIndex, Dims dimensions)方法。

总体思路是:拿到一个对维度未知的模型engine文件后,首先读入文件内容并做deserialize获得engine:

ARNet::ARNet(std::string engine_file,    std::string shape_file, std::string input_name,    std::string output_name)    : mEngine(nullptr) {  samplesCommon::OnnxSampleParams params;  params.inputTensorNames.push_back(input_name.c_str());  params.outputTensorNames.push_back(output_name.c_str());  params.int8 = false;  params.fp16 = true;  mParams = params;  std::string se_path = engine_file;  //"arnet_b1_fp16.engine";  if (access(se_path.c_str(), 4) != -1) {    nvinfer1::IRuntime* runtime = nvinfer1::createInferRuntime(gLogger);    std::ifstream fin(se_path);    std::string cached_engine = "";    while (fin.peek() != EOF) {      std::stringstream buffer;      buffer << fin.rdbuf();      cached_engine.append(buffer.str());    }    fin.close();    mEngine = std::shared_ptr( runtime->deserializeCudaEngine(cached_engine.data(),    cached_engine.size(), nullptr), samplesCommon::InferDeleter());    if(!mEngine){      std::cout <<"Deserialize from "<< se_path <<" failed!!! rebuild enigne ..." << std::endl;      build();      return;    }...

然后调用getBindingDimensions()查看engine的输入输出维度(如果知道维度就不用):

for (int i = 0; i getNbBindings(); i++){     nvinfer1::Dims dims = mEngine->getBindingDimensions(i);     printf("index %d, dims: (");     for (int d = 0; d < dims.nbDims; d ++)     {  if (d < dims.nbDims -1)printf("%d,", dims.d[d]);  else      printf("%d", dims.d[d]);     }     printf(")\n");}

在调用context->executeV2()做推理前把维度值为-1的动态维度值替换成具体的维度并调用context->setBindingDimensions()设置具体维度,然后在数据填入input buffer准备好后调用context->executeV2()做推理即可:

samplesCommon::BufferManager buffers(mEngine);  ...  auto context = SampleUniquePtr(      mEngine->createExecutionContext());  if (!context) {    return -1;  }  context->setOptimizationProfile(0);  nvinfer1::Dims dims5;   dims5.d[0] = 1;    // replace dynamic batch size with 1  dims5.d[1] = mInputDims.d[1];  dims5.d[2] = mInputDims.d[2];  dims5.d[3] = mInputDims.d[3];  dims5.d[4] = mInputDims.d[4];  dims5.nbDims = 5;  context->setBindingDimensions(0, dims5);  ...}int ARNet::infer(nvinfer1::IExecutionContext* context, samplesCommon::BufferManager& buffers, vector* p_cvImgs, int nClasses) {    ...  if (!processInput(buffers, p_cvImgs)) {    return -2;  }  buffers.copyInputToDevice();  bool status = context->executeV2(buffers.getDeviceBindings().data());  if (!status) {    return -3;  }  buffers.copyOutputToHost();  vector result = processOutput(buffers,nClasses);  ...}

关于getBindingDimensions()和setBindingDimensions()等重要的API的说明参见/usr/include/aarch64-linux-gnu/NvInferRuntime.h里的相关代码:

class ICudaEngine{public:    //!    //! \brief Get the number of binding indices.    //!    //! There are separate binding indices for each optimization profile.    //! This method returns the total over all profiles.    //! If the engine has been built for K profiles, the first getNbBindings() / K bindings are used by profile    //! number 0, the following getNbBindings() / K bindings are used by profile number 1 etc.    //!    //! \see getBindingIndex();    //!    virtual int getNbBindings() const noexcept = 0;    //!    //! \brief Retrieve the binding index for a named tensor.    //!    //! IExecutionContext::enqueue() and IExecutionContext::execute() require an array of buffers.    //!    //! Engine bindings map from tensor names to indices in this array.    //! Binding indices are assigned at engine build time, and take values in the range [0 ... n-1] where n is the total number of inputs and outputs.    //!    //! To get the binding index of the name in an optimization profile with index k > 0,    //! mangle the name by appending " [profile k]", as described for method getBindingName().    //!    //! \param name The tensor name.    //! \return The binding index for the named tensor, or -1 if the name is not found.    //!    //! \see getNbBindings() getBindingName()    //!    virtual int getBindingIndex(const char* name) const noexcept = 0;    //!    //! \brief Retrieve the name corresponding to a binding index.    //!    //! This is the reverse mapping to that provided by getBindingIndex().    //!    //! For optimization profiles with an index k > 0, the name is mangled by appending    //! " [profile k]", with k written in decimal.  For example, if the tensor in the    //! INetworkDefinition had the name "foo", and bindingIndex refers to that tensor in the    //! optimization profile with index 3, getBindingName returns "foo [profile 3]".    //!    //! \param bindingIndex The binding index.    //! \return The name corresponding to the index, or nullptr if the index is out of range.    //!    //! \see getBindingIndex()    //!    virtual const char* getBindingName(int bindingIndex) const noexcept = 0;    //!    //! \brief Determine whether a binding is an input binding.    //!    //! \param bindingIndex The binding index.    //! \return True if the index corresponds to an input binding and the index is in range.    //!    //! \see getBindingIndex()    //!    virtual bool bindingIsInput(int bindingIndex) const noexcept = 0;    //!    //! \brief Get the dimensions of a binding.    //!    //! \param bindingIndex The binding index.    //! \return The dimensions of the binding if the index is in range, otherwise Dims().    //!  Has -1 for any dimension that varies within the optimization profile.    //!    //! For example, suppose an INetworkDefinition has an input with shape [-1,-1]    //! that becomes a binding b in the engine.  If the associated optimization profile    //! specifies that b has minimum dimensions as [6,9] and maximum dimensions [7,9],    //! getBindingDimensions(b) returns [-1,9], despite the second dimension being    //! dynamic in the INetworkDefinition.    //!    //! Because each optimization profile has separate bindings, the returned value can    //! differ across profiles. Consider another binding b' for the same network input,    //! but for another optimization profile.  If that other profile specifies minimum    //! dimensions [5,8] and maximum dimensions [5,9], getBindingDimensions(b') returns [5,-1].    //!    //! \see getBindingIndex()    //!    virtual Dims getBindingDimensions(int bindingIndex) const noexcept = 0;    //!    //! \brief Determine the required data type for a buffer from its binding index.    //!    //! \param bindingIndex The binding index.    //! \return The type of the data in the buffer.    //!    //! \see getBindingIndex()    //!    virtual DataType getBindingDataType(int bindingIndex) const noexcept = 0;...class IExecutionContext{public:    //!    //! \brief Synchronously execute inference on a batch.    //!    //! This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()    //! \param batchSize The batch size. This is at most the value supplied when the engine was built.    //! \param bindings An array of pointers to input and output buffers for the network.    //!    //! \return True if execution succeeded.    //!    //! \see ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()    //!    virtual bool execute(int batchSize, void** bindings) noexcept = 0;    //!    //! \brief Asynchronously execute inference on a batch.    //!    //! This method requires an array of input and output buffers. The mapping from tensor names to indices can be queried using ICudaEngine::getBindingIndex()    //! \param batchSize The batch size. This is at most the value supplied when the engine was built.    //! \param bindings An array of pointers to input and output buffers for the network.    //! \param stream A cuda stream on which the inference kernels will be enqueued    //! \param inputConsumed An optional event which will be signaled when the input buffers can be refilled with new data    //!    //! \return True if the kernels were enqueued successfully.    //!    //! \see ICudaEngine::getBindingIndex() ICudaEngine::getMaxBatchSize()    //!    virtual bool enqueue(int batchSize, void** bindings, cudaStream_t stream, cudaEvent_t* inputConsumed) noexcept = 0;...    //!    //! \brief Select an optimization profile for the current context.    //!    //! \param profileIndex Index of the profile. It must lie between 0 and    //! getEngine().getNbOptimizationProfiles() - 1    //!    //! The selected profile will be used in subsequent calls to execute() or enqueue().    //!    //! If the associated CUDA engine has dynamic inputs, this method must be called at least once    //! with a unique profileIndex before calling execute or enqueue (i.e. the profile index    //! may not be in use by another execution context that has not been destroyed yet).    //! For the first execution context that is created for an engine, setOptimizationProfile(0)    //! is called implicitly.    //!    //! If the associated CUDA engine does not have inputs with dynamic shapes, this method need not be    //! called, in which case the default profile index of 0 will be used (this is particularly    //! the case for all safe engines).    //!    //! setOptimizationProfile() must be called before calling setBindingDimensions() and    //! setInputShapeBinding() for all dynamic input tensors or input shape tensors, which in    //! turn must be called before either execute() or enqueue().    //!    //! \return true if the call succeeded, else false (e.g. input out of range)    //!    //! \see ICudaEngine::getNbOptimizationProfiles()    virtual bool setOptimizationProfile(int profileIndex) noexcept = 0;...    //!    //! \brief Set the dynamic dimensions of a binding    //!    //! Requires the engine to be built without an implicit batch dimension.    //! The binding must be an input tensor, and all dimensions must be compatible with    //! the network definition (i.e. only the wildcard dimension -1 can be replaced with a    //! new dimension > 0). Furthermore, the dimensions must be in the valid range for the    //! currently selected optimization profile, and the corresponding engine must not be    //! safety-certified.    //!    //! This method will fail unless a valid optimization profile is defined for the current    //! execution context (getOptimizationProfile() must not be -1).    //!    //! For all dynamic non-output bindings (which have at least one wildcard dimension of -1),    //! this method needs to be called before either enqueue() or execute() may be called.    //! This can be checked using the method allInputDimensionsSpecified().    //!    //! \return false if an error occurs (e.g. index out of range), else true    //!    //! \see ICudaEngine::getBindingIndex    //!    virtual bool setBindingDimensions(int bindingIndex, Dims dimensions) noexcept = 0;...