代码收藏家技术教程 2024-12-02

Python通过OpenVINO调用Intel GPU推理YOLOv8模型

随着人工智能的不断发展，使用GPU进行深度学习模型推理已成为加速AI应用的常用手段。Intel的Arc GPU在AI推理方面表现优异，结合OpenVINO工具套件，可以充分利用其硬件加速特性。在这篇文章中，我们将介绍如何在Python环境下，利用OpenVINO调用Intel GPU进行YOLOv8模型的推理。

环境准备

要在Python中通过OpenVINO调用Intel GPU进行推理，我们首先需要进行环境的配置。包括Python、OpenVINO工具套件以及相关的依赖库。

1.1 Python安装

首先，确保系统中安装了Python。推荐使用3.8或更高版本。可以通过以下命令检查Python版本：

python --version

如果未安装Python，可以从Python官网下载并安装。

1.2 安装OpenVINO

OpenVINO是一款功能强大的AI推理工具，它支持多种硬件平台，包括Intel的Arc GPU。按照以下步骤安装OpenVINO：

访问OpenVINO官网下载适用于您操作系统的版本。
解压并安装OpenVINO工具套件。
配置环境变量，确保可以从命令行调用OpenVINO的工具。

可以使用以下命令确认OpenVINO安装是否成功：

source /opt/intel/openvino/setupvars.sh

或者在Windows上：

"C:\Program Files (x86)\Intel\openvino_2022\bin\setupvars.bat"

1.3 安装Python依赖库

在推理过程中，我们需要用到以下Python库：

OpenVINO的Python API (openvino)
图像处理库 (opencv-python)
其他辅助库，例如：numpy
使用pip安装这些依赖：

pip install openvino opencv-python-headless numpy

其中注意，openvino需要安装>2024.0.0的版本，不然不支持iGPU的调用

pip install openvino==2024.0.0

2. YOLOv8模型推理

配置完成后，我们可以使用OpenVINO调用Intel GPU进行YOLOv8模型的推理。下面是一个简单的示例代码：

import cv2
import numpy as np
from openvino.runtime import Core
import time
from tqdm import tqdm
import torch
import os

class SimpleLetterBox:
    def __init__(self, new_shape=(640, 640), center=True, resize_only=False, use_gpu=False):
        """Initialize SimpleLetterBox with target shape and padding mode."""
        self.new_shape = new_shape
        self.center = center
        self.resize_only = resize_only
        self.use_gpu = use_gpu

    def __call__(self, image):
        if self.resize_only:
            if self.use_gpu:
                # 使用 GPU 执行预处理
                image_gpu = cv2.UMat(image)
                resized_image = cv2.resize(image_gpu, self.new_shape, interpolation=cv2.INTER_LINEAR).get()
            else:
                resized_image = cv2.resize(image, self.new_shape, interpolation=cv2.INTER_LINEAR)
            return resized_image

        # Resize image and pad to the target shape
        shape = image.shape[:2]  # current shape [height, width]
        new_shape = self.new_shape

        # Scale ratio (new / old)
        r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
        new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))

        # Compute padding
        dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]
        if self.center:
            dw /= 2  # divide padding into 2 sides
            dh /= 2

        if shape[::-1] != new_unpad:  # resize
            if self.use_gpu:
                image_gpu = cv2.UMat(image)
                image = cv2.resize(image_gpu, new_unpad, interpolation=cv2.INTER_LINEAR).get()
            else:
                image = cv2.resize(image, new_unpad, interpolation=cv2.INTER_LINEAR)

        # Add padding
        top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
        left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
        padded_image = cv2.copyMakeBorder(
            image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(114, 114, 114)
        )
        return padded_image

batch_size = 1
use_gpu = True
input_shape = (2048, 2048)
letterbox = SimpleLetterBox(input_shape, resize_only=False, use_gpu=use_gpu)
imgs_dir = r"C:\Users\leaper\Downloads\dataset2000\images\part"
imgs_paths = os.listdir(imgs_dir)

# Initialize OpenVINO model
core = Core()
model_path = f"C:/Users/leaper/Desktop/ultralytics/yinlie2048_openvino_model_bs{batch_size}/yinlie2048.xml"
model = core.read_model(model_path)
compiled_model = core.compile_model(model, "GPU.0")

total_preprocess_time = 0
total_inference_time = 0
num_batches = 0

# Process images in batches
for i in tqdm(range(0, len(imgs_paths), batch_size)):
    imgs_batch = []
    # Load and preprocess images
    for j in range(batch_size):
        if i + j < len(imgs_paths):
            img_path = os.path.join(imgs_dir, imgs_paths[i + j])
            image = cv2.imread(img_path)
            if image is not None:
                imgs_batch.append(letterbox(image))
    start_preprocess_time = time.time()
    im = np.stack(imgs_batch)
    im = im[..., ::-1].transpose((0, 3, 1, 2))  # BGR to RGB, BHWC to BCHW, (n, 3, h, w)
    im = np.ascontiguousarray(im)  # contiguous
    im = torch.from_numpy(im)
    im = im.half() / 255
    preprocess_time = (time.time() - start_preprocess_time) * 1000  # Convert to milliseconds

    # Model inference
    start_inference_time = time.time()
    output = compiled_model([im])
    inference_time = (time.time() - start_inference_time) * 1000  # Convert to milliseconds

    total_preprocess_time += preprocess_time
    total_inference_time += inference_time
    num_batches += 1

    # print(f"Batch {i // batch_size + 1} - Preprocessing time: {preprocess_time:.2f} ms, Inference time: {inference_time:.2f} ms")

# Output average times
average_preprocess_time = total_preprocess_time / num_batches
average_inference_time = total_inference_time / num_batches

print(f"Average preprocessing time: {average_preprocess_time:.2f} ms")
print(f"Average inference time: {average_inference_time:.2f} ms")

作者：ZhouDevin