代码收藏家技术教程 14天前

使用TensorRT在Python中加速模型部署的指南

#记录一下使用TensorRT部署模型的流程

在安装TensorRT之前需要明确一点的是，最好不要在现有的虚拟环境中安装，很容易与已有的库产生冲突，导致之前的程序不能用。我的方法是将现有的虚拟环境clone一下，在新的环境中进行接下来的操作。

安装TensorRT之前我的系统设置如下：

系统：Windows11

cuda：12.1 （cuda和cudnn的安装有很多教程，安装适合自己电脑的版本即可）

python：3.9（clone的虚拟环境）

pytorch：2.5.1（clone的虚拟环境）（可选）

确定以上几点之后，就可以开始安装TensorRT了。

1.下载安装包

进入官网，TensorRT Download | NVIDIA Developer，官网界面如下。由于我的系统是win11，所以只能点进TensorRT 10，如果是win10，也可以点击TensorRT 8、TensorRT 7等。

2.安装TensorRT

在这一步我找了很多教程，步骤不是很一样，也不太确定是哪一步做对了，在这里都放上，全都做一遍应该没什么问题。

（1）解压下载的ZIP文件，如下：

（2）从解压好的文件夹中复制文件到C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1, 对应复制路径. PS: 此处的复制需要管理员权限. 此处CUDA的版本v12.1需要根据自身安装的情况而定。

序号	所需复制文件	来源	目标
1	…\TensorRT-10.2.0.19\bin\trtexec.exe	…\TensorRT-10.2.0.19\bin	C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\bin
2	…\TensorRT-10.2.0.19\include下的所有文件	…\TensorRT-10.2.0.19\include	C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\include
3	…\TensorRT-10.2.0.19\lib下的所有lib文件	…\TensorRT-10.2.0.19\lib	C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib\x64
4	…\TensorRT-10.2.0.19\lib下的所有dll文件	…\TensorRT-10.2.0.19\lib	C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\lib

（3）环境变量设置，最后一行是解压后的ZIP文件的路径，根据自己的情况更改即可。设置之后逐步点击确定。

（4）安装whl文件。进入文件所在目录，打开终端，调用指令安装。文件路径是…\TensorRT-10.2.0.19\python，选择适合自己的版本，安装指令是pip install tensorrt-10.2.0-cp39-none-win_amd64.whl. 注意：这里我是先进入虚拟环境，再进行的安装。

（5）测试是否安装成功。没有报错，说明安装成功。

3.将onnx文件转为TensorRT。

trtexec.exe --onnx=mymodel.onnx --saveEngine=model.engine --fp16

–onnx 后面是onnx文件地址

–saveEngine 后面是保存TensorRT文件的地址

–fp16 表示使用fp16 精度（看个人需求，精度略降，速度提高。并且有些模型使用fp16会出错）

转换需要一些时间，转换成功之后是这样的。

5.安装pycuda。这一步是使用model.engine进行推理时会用到的。

看了很多关于pycuda的安装，都是说去一个镜像网站搜索，但是根本没有找到。最后还是选择去了官网https://anaconda.org/。可以不用登陆，进入官方提供的地址。我是直接在anoconda prompt使用下面这条指令下载的，会自动下载适合自己电脑环境版本的pycuda。

conda install conda-forge::pycuda

6.（可选）由于下载pycuda时会改变之前安装的库的版本，导致我的pillow不能用了，所以我还加了一步，重新下载pillow。使用这个版本就不报错了。

pip install pillow==11.1.0

7.使用第4步生成的model.engine文件进行推理。需要强调一点，我的这个代码是根据官网给出的示例自己修改的，可能只适合于TensorRT10.2版本。（一开始是用网上找的推理程序，但有很多函数都不能用。意识到应该是版本问题后，去下载的ZIP文件里找的infer.py,不过还是有几个函数不能用）。最后的解决方案是根据报错位置，相应的更改为使用具有相同功能的pycuda的函数。

整体的推理程序放在这里了。

class TensorRTInfer:
    """
    Implements inference for the Model TensorRT engine.
    """

    def __init__(self, engine_path):
        """
        :param engine_path: The path to the serialized engine to load from disk.
        """

        # Load TRT engine
        self.logger = trt.Logger(trt.Logger.ERROR)
        trt.init_libnvinfer_plugins(self.logger, namespace="")
        with open(engine_path, "rb") as f, trt.Runtime(self.logger) as runtime:
            assert runtime
            self.engine = runtime.deserialize_cuda_engine(f.read())
        assert self.engine
        self.context = self.engine.create_execution_context()
        assert self.context

        # Setup I/O bindings
        self.inputs = []
        self.outputs = []
        self.allocations = []
        for i in range(self.engine.num_io_tensors):
            name = self.engine.get_tensor_name(i)
            is_input = False
            if self.engine.get_tensor_mode(name) == trt.TensorIOMode.INPUT:
                is_input = True
            dtype = self.engine.get_tensor_dtype(name)
            shape = self.engine.get_tensor_shape(name)
            if is_input:
                self.batch_size = shape[0]
            size = np.dtype(trt.nptype(dtype)).itemsize
            for s in shape:
                size *= s
            allocation = cudart.mem_alloc(size)
            binding = {
                "index": i,
                "name": name,
                "dtype": np.dtype(trt.nptype(dtype)),
                "shape": list(shape),
                "allocation": allocation,
                "size": size,
            }
            self.allocations.append(allocation)
            if is_input:
                self.inputs.append(binding)
            else:
                self.outputs.append(binding)

        assert self.batch_size > 0
        assert len(self.inputs) > 0
        assert len(self.outputs) > 0
        assert len(self.allocations) > 0

    def input_spec(self):
        """
        Get the specs for the input tensor of the network. Useful to prepare memory allocations.
        :return: Two items, the shape of the input tensor and its (numpy) datatype.
        """
        return self.inputs[0]["shape"], self.inputs[0]["dtype"]

    def output_spec(self):
        """
        Get the specs for the output tensors of the network. Useful to prepare memory allocations.
        :return: A list with two items per element, the shape and (numpy) datatype of each output tensor.
        """
        specs = []
        for o in self.outputs:
            specs.append((o["shape"], o["dtype"]))
        return specs

    def infer(self, batch, scales=None, nms_threshold=None):
        """
        Execute inference on a batch of images. The images should already be batched and preprocessed, as prepared by
        the ImageBatcher class. Memory copying to and from the GPU device will be performed here.
        :param batch: A numpy array holding the image batch.
        :param scales: The image resize scales for each image in this batch. Default: No scale postprocessing applied.
        :return: A nested list for each image in the batch and each detection in the list.
        """

        # Prepare the output data.
        outputs = []
        for shape, dtype in self.output_spec():
            outputs.append(np.zeros(shape, dtype))

        # Process I/O and execute the network.
        # 分配设备内存
        cudart.memcpy_htod(self.inputs[0]["allocation"], np.ascontiguousarray(batch))

        self.context.execute_v2(self.allocations)
        for o in range(len(outputs)):
            # 从设备内存复制数据到主机内存
            cudart.memcpy_dtoh(outputs[o], self.outputs[o]["allocation"])

        # Process the results.
        prediction = outputs[0]
        return prediction

调用这个session的方法是

    # 初始化推理器
    session = TensorRTInfer(model_folder)
    prediction = session.infer(input_data)

8.总结。

使用TensorRT的加速效果还是很明显的，与使用onnx进行推理相比，整体的推理速度加快了半分钟。

作者：phyllis_110

物联沃分享整理
物联沃-IOTWORD物联网 » 使用TensorRT在Python中加速模型部署的指南

代码收藏家普通

分享到：

代码收藏家 普通

相关推荐

发表回复 取消回复

代码收藏家普通

发表回复取消回复