代码收藏家技术教程 2022-09-15

36、RK3399Pro 环境搭建和Yolov5 c++调用opencv进行RKNN模型部署和使用

基本思想：记录rk3399 pro配置环境和c++ npu开发记录，主要想搞一份c++代码和其它图像算法结合一下，好进行部署,淘宝链接见附录

需要的python3.7对应的aarch64的whl包：包含opencv-whl 、h5py-whl包：

链接: https://pan.baidu.com/s/1cvCAmHBa_4KgEjrcFIYnig 提取码: 5ui4

链接: https://pan.baidu.com/s/1hrcr8Fc2sboD1_uy8T1Z8Q 提取码: e4wq

第一步：因为系统是python3.8,因为官方没有提供python3.8的rknn的版本，需要安装anconda环境构建python3.6

ubuntu@ubuntu:~$ ssh firefly@192.168.85.45
firefly@10.10.85.75's password: 
 _____ _           __ _       
|  ___(_)_ __ ___ / _| |_   _ 
| |_  | | '__/ _ \ |_| | | | |
|  _| | | | |  __/  _| | |_| |
|_|   |_|_|  \___|_| |_|\__, |
                        |___/ 
Welcome to Ubuntu 20.04.3 LTS (GNU/Linux 4.4.194 aarch64)

 * Documentation:  http://wiki.t-firefly.com
 * Management:     http://www.t-firefly.com

System information as of Mon Aug 22 09:20:44 UTC 2022

System load:   0.40 0.13 0.05  	Up time:       0 min		Local users:   2            	
Memory usage:  7 % of 3777MB 	IP:            10.10.85.75
10.10.85.232
Usage of /:    14% of 12G    	

Last login: Mon Aug 22 09:03:35 2022 from 10.10.84.222
firefly@firefly:~$ python3
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()

firefly@firefly:~$ sudo apt-get -y install libprotobuf-dev protobuf-compiler gfortran \
libboost-dev cmake libleveldb-dev libsnappy-dev \
libboost-thread-dev libboost-system-dev \
libatlas-base-dev libhdf5-serial-dev libgflags-dev \
libgoogle-glog-dev liblmdb-dev \
libboost-all-dev libgtk2.0-dev pkg-config
sudo apt-get -y remove ffmpeg x264 libx264-dev
sudo apt-get -y install libopencv-dev
sudo apt-get -y install build-essential checkinstall cmake pkg-config yasm
sudo apt-get -y install libtiff4-dev libjpeg-dev libjasper-dev
sudo apt-get -y install libavcodec-dev libavformat-dev libswscale-dev libdc1394-22-dev libxine-dev libgstreamer0.10-dev libgstreamer-plugins-base0.10-dev libv4l-dev
sudo apt-get -y install python-dev python-numpy libgtk2.0-dev
sudo apt-get -y install libtbb-dev libeigen3-dev
sudo apt-get -y install libqt4-dev libgtk2.0-dev
sudo apt-get install ibgtk2.0-dev pkg-config
sudo apt-get -y install libfaac-dev libmp3lame-dev libopencore-amrnb-dev libopencore-amrwb-dev libtheora-dev libvorbis-dev libxvidcore-dev
sudo apt-get -y install x264 v4l-utils ffmpeg
sudo apt-get -y install libgtk2.0-dev

下载anconda包，进行安装成功，然后构建环境

firefly@firefly:~$ wget https://github.com/Archiconda/build-tools/releases/download/0.2.2/Archiconda3-0.2.2-Linux-aarch64.sh
firefly@firefly:~$ sh Archiconda3-0.2.2-Linux-aarch64.sh
firefly@firefly:~$ python3
Python 3.8.10 (default, Jun 22 2022, 20:18:18) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> exit()
firefly@firefly:~$ source ~/.bashrc
firefly@firefly:~$ python3
Python 3.7.1 | packaged by conda-forge | (default, Jan  7 2019, 00:11:41) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 
firefly@firefly:~$ conda create -n rknnpy37 python=3.7
Executing transaction: done
#
# To activate this environment, use
#
#     $ conda activate rknnpy37
#
# To deactivate an active environment, use
#
#     $ conda deactivate
firefly@firefly:~$ conda activate rknnpy37
(rknnpy37) firefly@firefly:~$

第二步:参考上一篇博客35、ubuntu20.04搭建瑞芯微的npu仿真环境和测试rv1126的Debain系统下的yolov5+npu检测功能以及RKNN推理部署_sxj731533730的博客-CSDN博客

配置环境安装PC安装rknn-toolkit以来支持模型的转，就没必要在开发板安装RKNN　Index of /pypi/simple/

(rknnpy37) firefly@firefly:~$ git clone https://github.com/rockchip-linux/rknn-toolkit.git
(rknnpy37) firefly@firefly:~/rknn-toolkit/rknn-toolkit-lite/packages$ pip3 install rknn_toolkit_lite-1.7.1-cp37-cp37m-linux_aarch64.whl -i http://pypi.douban.com/simple/ --trusted-host pypi.douban.com
...
Successfully built numpy psutil ruamel.yaml
Installing collected packages: ruamel.yaml, psutil, numpy, rknn-toolkit-lite
Successfully installed numpy-1.16.3 psutil-5.6.2 rknn-toolkit-lite-1.7.1 ruamel.yaml-0.15.81
(rknnpy37) firefly@firefly:~/rknn-toolkit/rknn-toolkit-lite/packages$ python3
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:22:24) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rknnlite.api import RKNNLite
>>> 
Processing ./opencv_python-4.5.2.54-cp37-cp37m-manylinux2014_aarch64.whl
Requirement already satisfied: numpy>=1.19.3 in /home/firefly/archiconda3/envs/rknnpy37/lib/python3.7/site-packages (from opencv-python==4.5.2.54) (1.21.6)
Installing collected packages: opencv-python
Successfully installed opencv-python-4.5.2.54

第三步，测试demo发现报错，没有npu信息，需要更新一下驱动,然后测试demo成功

(rknnpy37) firefly@firefly:~/rknn-toolkit/rknn-toolkit-lite/examples/inference_with_lite$ python3 test.py 
--> Load RKNN model
done
--> Init runtime environment
E Only support ntb mode on Linux_x64 aarch64. But can not find device with ntb mode.
E Catch exception when init runtime!
E Traceback (most recent call last):
  File "/home/firefly/archiconda3/envs/rknnpy37/lib/python3.7/site-packages/rknnlite/api/rknn_lite.py", line 145, in init_runtime
    async_mode=async_mode, rknn2precompile=rknn2precompile)
  File "rknnlite/api/rknn_runtime.py", line 201, in rknnlite.api.rknn_runtime.RKNNRuntime.__init__
  File "rknnlite/api/rknn_runtime.py", line 637, in rknnlite.api.rknn_runtime.RKNNRuntime._connect
Exception: Init runtime environment failed!

Init runtime environment failed

(rknnpy37) firefly@firefly:~/rknn-toolkit/rknn-toolkit-lite/examples/inference_with_lite$ python3
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:22:24) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rknnlite.api import RKNNLite
>>> rknn=RKNNLite()
>>> rknn.list_devices()
/home/firefly/archiconda3/envs/rknnpy37/lib/python3.7/site-packages/rknnlite/3rdparty/platform-tools/ntp/linux-aarch64/npu_transfer_proxy devices
*************************
None devices connected.
*************************
([], [])
>>> exit()
(rknnpy37) firefly@firefly:~/rknn-toolkit/rknn-toolkit-lite/examples/inference_with_lite$ sudo apt install firefly-3399pronpu-driver

需要重启一下设备

firefly@firefly:~$ conda activate rknnpy37
(rknnpy37) firefly@firefly:~$ python3
Python 3.7.12 | packaged by conda-forge | (default, Oct 26 2021, 06:22:24) 
[GCC 9.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from rknnlite.api import RKNNLite
>>> rknn=RKNNLite()
>>> rknn.list_devices()
/home/firefly/archiconda3/envs/rknnpy37/lib/python3.7/site-packages/rknnlite/3rdparty/platform-tools/ntp/linux-aarch64/npu_transfer_proxy devices
*************************
all device(s) with ntb mode:
0123456789ABCDEF
*************************
([], ['0123456789ABCDEF'])
>>> 
(rknnpy37) firefly@firefly:~/rknn-toolkit/rknn-toolkit-lite/examples/inference_with_lite$ python3 test.py 
--> Load RKNN model
done
--> Init runtime environment
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:51)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.7.1 (566a9b6 build: 2021-10-28 14:56:17)
D RKNNAPI:   DRV: 1.7.1 (0cfd4a1 build: 2021-12-10 09:43:11)
D RKNNAPI: ==============================================
done
--> Running model
D RKNNAPI: __can_use_fixed_point: pix = 3, diff = {0, 1, 0} != 0, disable fixed pointer convert.

D RKNNAPI: __can_use_fixed_point: use_fixed_point = 0.

resnet18
-----TOP 5-----
[812]: 0.999442994594574
[404]: 0.0004096269840374589
[657]: 3.2845448004081845e-05
[833]: 2.6112385967280716e-05
[895]: 1.8509887013351545e-05

done
(rknnpy37) firefly@firefly:~/rknn-toolkit/rknn-toolkit-lite/examples/inference_with_lite$

第四步：在pc机上操作：下载yolov5.0 v5.0的源码和模型，将model.py的export.py移动到主目录下，同时修改yolo.py参考上一篇博客　Releases · ultralytics/yolov5 · GitHub

ubuntu@ubuntu:~/Wallpapers/yolov5-5.0$ python3 export.py --weights yolov5s.pt  --img 640 --batch 1
Namespace(batch_size=1, device='cpu', dynamic=False, grid=False, img_size=[640, 640], weights='yolov5s.pt')
YOLOv5 🚀 2021-4-12 torch 1.9.0+cu111 CPU

改名字为yolov5s_20220823.onnx在pc的仿真环境转一下模型

(rknnpy36) ubuntu@ubuntu:~/Downloads/rknn-toolkit-1.7.1/examples/onnx/yolov5$ python3 rknntest.py

转换模型的源码

from rknn.api import RKNN

ONNX_MODEL = 'yolov5s_20220823.onnx'
RKNN_MODEL = 'yolov5s_20220823.rknn'

if __name__ == '__main__':

    # Create RKNN object
    rknn = RKNN(verbose=True)

    # pre-process config
    print('--> config model')
    rknn.config(mean_values=[[0, 0, 0]], std_values=[[255, 255, 255]], reorder_channel='0 1 2',
                target_platform='rk3399pro',
                quantized_dtype='asymmetric_affine-u8', optimization_level=3, output_optimize=1)
    print('done')

    print('--> Loading model')
    ret = rknn.load_onnx(model=ONNX_MODEL)
    if ret != 0:
        print('Load model  failed!')
        exit(ret)
    print('done')

    # Build model
    print('--> Building model')
    ret = rknn.build(do_quantization=True, dataset='dataset.txt')  # ,pre_compile=True
    if ret != 0:
        print('Build yolov5s failed!')
        exit(ret)
    print('done')

    # Export rknn model
    print('--> Export RKNN model')
    ret = rknn.export_rknn(RKNN_MODEL)
    if ret != 0:
        print('Export yolov5s.rknn failed!')
        exit(ret)
    print('done')

    rknn.release()

转成功

D input tensor id = 3, name = images_220
I Build config finished.
done
--> Export RKNN model
done

第五步：在pc机器上测试该模型，测试代码也是上篇

测试代码

import os
import urllib
import traceback
import time
import sys
import numpy as np
import cv2
from rknn.api import RKNN

RKNN_MODEL = 'yolov5s_20220823.rknn'
IMG_PATH = 'bus.jpg'

QUANTIZE_ON = True

BOX_THRESH = 0.5
NMS_THRESH = 0.6
IMG_SIZE = 640

CLASSES = ('person', 'bicycle', 'car', 'motorcycle', 'airplane', 'bus', 'train', 'truck', 'boat', 'traffic light',
         'fire hydrant', 'stop sign', 'parking meter', 'bench', 'bird', 'cat', 'dog', 'horse', 'sheep', 'cow',
         'elephant', 'bear', 'zebra', 'giraffe', 'backpack', 'umbrella', 'handbag', 'tie', 'suitcase', 'frisbee',
         'skis', 'snowboard', 'sports ball', 'kite', 'baseball bat', 'baseball glove', 'skateboard', 'surfboard',
         'tennis racket', 'bottle', 'wine glass', 'cup', 'fork', 'knife', 'spoon', 'bowl', 'banana', 'apple',
         'sandwich', 'orange', 'broccoli', 'carrot', 'hot dog', 'pizza', 'donut', 'cake', 'chair', 'couch',
         'potted plant', 'bed', 'dining table', 'toilet', 'tv', 'laptop', 'mouse', 'remote', 'keyboard', 'cell phone',
         'microwave', 'oven', 'toaster', 'sink', 'refrigerator', 'book', 'clock', 'vase', 'scissors', 'teddy bear',
         'hair drier', 'toothbrush' )


def sigmoid(x):
    return 1 / (1 + np.exp(-x))


def xywh2xyxy(x):
    # Convert [x, y, w, h] to [x1, y1, x2, y2]
    y = np.copy(x)
    y[:, 0] = x[:, 0] - x[:, 2] / 2  # top left x
    y[:, 1] = x[:, 1] - x[:, 3] / 2  # top left y
    y[:, 2] = x[:, 0] + x[:, 2] / 2  # bottom right x
    y[:, 3] = x[:, 1] + x[:, 3] / 2  # bottom right y
    return y


def resize_postprocess(x, offset_x, offset_y):
    # Convert [x1, y1, x2, y2] to [x1, y1, x2, y2]
    y = np.copy(x)
    y[:, 0] = x[:, 0] / offset_x  # top left x
    y[:, 1] = x[:, 1] / offset_y  # top left y
    y[:, 2] = x[:, 2] / offset_x  # bottom right x
    y[:, 3] = x[:, 3] / offset_y  # bottom right y
    return y


def process(input, mask, anchors):
    anchors = [anchors[i] for i in mask]
    grid_h, grid_w = map(int, input.shape[0:2])

    box_confidence = sigmoid(input[..., 4])
    box_confidence = np.expand_dims(box_confidence, axis=-1)

    box_class_probs = sigmoid(input[..., 5:])

    box_xy = sigmoid(input[..., :2]) * 2 - 0.5

    col = np.tile(np.arange(0, grid_w), grid_w).reshape(-1, grid_w)
    row = np.tile(np.arange(0, grid_h).reshape(-1, 1), grid_h)
    col = col.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    row = row.reshape(grid_h, grid_w, 1, 1).repeat(3, axis=-2)
    grid = np.concatenate((col, row), axis=-1)
    box_xy += grid
    box_xy *= int(IMG_SIZE / grid_h)

    box_wh = pow(sigmoid(input[..., 2:4]) * 2, 2)
    box_wh = box_wh * anchors

    box = np.concatenate((box_xy, box_wh), axis=-1)

    return box, box_confidence, box_class_probs


def filter_boxes(boxes, box_confidences, box_class_probs):
    """Filter boxes with box threshold. It's a bit different with origin yolov5 post process!
    # Arguments
        boxes: ndarray, boxes of objects.
        box_confidences: ndarray, confidences of objects.
        box_class_probs: ndarray, class_probs of objects.
    # Returns
        boxes: ndarray, filtered boxes.
        classes: ndarray, classes for boxes.
        scores: ndarray, scores for boxes.
    """
    box_classes = np.argmax(box_class_probs, axis=-1)
    box_class_scores = np.max(box_class_probs, axis=-1)
    pos = np.where(box_confidences[..., 0] >= BOX_THRESH)

    boxes = boxes[pos]
    classes = box_classes[pos]
    scores = box_class_scores[pos]

    return boxes, classes, scores


def nms_boxes(boxes, scores):
    """Suppress non-maximal boxes.
    # Arguments
        boxes: ndarray, boxes of objects.
        scores: ndarray, scores of objects.
    # Returns
        keep: ndarray, index of effective boxes.
    """
    x = boxes[:, 0]
    y = boxes[:, 1]
    w = boxes[:, 2] - boxes[:, 0]
    h = boxes[:, 3] - boxes[:, 1]

    areas = w * h
    order = scores.argsort()[::-1]

    keep = []
    while order.size > 0:
        i = order[0]
        keep.append(i)

        xx1 = np.maximum(x[i], x[order[1:]])
        yy1 = np.maximum(y[i], y[order[1:]])
        xx2 = np.minimum(x[i] + w[i], x[order[1:]] + w[order[1:]])
        yy2 = np.minimum(y[i] + h[i], y[order[1:]] + h[order[1:]])

        w1 = np.maximum(0.0, xx2 - xx1 + 0.00001)
        h1 = np.maximum(0.0, yy2 - yy1 + 0.00001)
        inter = w1 * h1

        ovr = inter / (areas[i] + areas[order[1:]] - inter)
        inds = np.where(ovr <= NMS_THRESH)[0]
        order = order[inds + 1]
    keep = np.array(keep)
    return keep


def yolov5_post_process(input_data):
    masks = [[0, 1, 2], [3, 4, 5], [6, 7, 8]]
    anchors = [[10, 13], [16, 30], [33, 23], [30, 61], [62, 45],
               [59, 119], [116, 90], [156, 198], [373, 326]]

    boxes, classes, scores = [], [], []
    for input, mask in zip(input_data, masks):
        b, c, s = process(input, mask, anchors)
        b, c, s = filter_boxes(b, c, s)
        boxes.append(b)
        classes.append(c)
        scores.append(s)

    boxes = np.concatenate(boxes)
    boxes = xywh2xyxy(boxes)
    classes = np.concatenate(classes)
    scores = np.concatenate(scores)

    nboxes, nclasses, nscores = [], [], []
    for c in set(classes):
        inds = np.where(classes == c)
        b = boxes[inds]
        c = classes[inds]
        s = scores[inds]
        keep = nms_boxes(b, s)
        nboxes.append(b[keep])
        nclasses.append(c[keep])
        nscores.append(s[keep])

    if not nclasses and not nscores:
        return None, None, None

    boxes = np.concatenate(nboxes)
    classes = np.concatenate(nclasses)
    scores = np.concatenate(nscores)

    return boxes, classes, scores


def draw(image, boxes, scores, classes):
    """Draw the boxes on the image.
    # Argument:
        image: original image.
        boxes: ndarray, boxes of objects.
        classes: ndarray, classes of objects.
        scores: ndarray, scores of objects.
        all_classes: all classes name.
    """
    for box, score, cl in zip(boxes, scores, classes):
        top, left, right, bottom = box
        print('class: {}, score: {}'.format(CLASSES[cl], score))
        print('box coordinate left,top,right,down: [{}, {}, {}, {}]'.format(top, left, right, bottom))
        top = int(top)
        left = int(left)
        right = int(right)
        bottom = int(bottom)

        cv2.rectangle(image, (top, left), (right, bottom), (255, 0, 0), 2)
        cv2.putText(image, '{0} {1:.2f}'.format(CLASSES[cl], score),
                    (top, left - 6),
                    cv2.FONT_HERSHEY_SIMPLEX,
                    0.6, (0, 0, 255), 2)


def letterbox(im, new_shape=(640, 640), color=(0, 0, 0)):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)


def letter_box_postprocess(x, scalingfactor, xy_correction):
    y = np.copy(x)
    y[:, 0] = (x[:, 0] - xy_correction[0]) / scalingfactor  # top left x
    y[:, 1] = (x[:, 1] - xy_correction[1]) / scalingfactor  # top left y
    y[:, 2] = (x[:, 2] - xy_correction[0]) / scalingfactor  # bottom right x
    y[:, 3] = (x[:, 3] - xy_correction[1]) / scalingfactor  # bottom right y
    return y


def get_file(filepath):
    templist = []
    with open(filepath, "r") as f:
        for item in f.readlines():
            templist.append(item.strip())
    return templist


if __name__ == '__main__':

    # Create RKNN object
    rknn = RKNN()
    image_process_mode = "letter_box"
    print("image_process_mode = ", image_process_mode)

    if not os.path.exists(RKNN_MODEL):
        print('model not exist')
        exit(-1)

    # Load ONNX model
    print('--> Loading model')
    ret = rknn.load_rknn(RKNN_MODEL)
    if ret != 0:
        print('Load rknn model failed!')
        exit(ret)
    print('done')

    # init runtime environment
    print('--> Init runtime environment')
    ret = rknn.init_runtime()
    if ret != 0:
        print('Init runtime environment failed')
        exit(ret)
    print('done')

    image = cv2.imread(IMG_PATH)
    img_height = image.shape[0]
    img_width = image.shape[1]
    # img, ratio, (dw, dh) = letterbox(img, new_shape=(IMG_SIZE, IMG_SIZE))
    img = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    if image_process_mode == "resize":
        img = cv2.resize(img, (IMG_SIZE, IMG_SIZE))
    elif image_process_mode == "letter_box":
        img, scale_factor, correction = letterbox(img)
    # Inference
    print('--> Running model')
    start = time.clock()
    outputs = rknn.inference(inputs=[img])
    end = time.clock()
    runTime = end - start
    print("the inference time: ", runTime, "s")
    # post process
    input0_data = outputs[0]
    input1_data = outputs[1]
    input2_data = outputs[2]

    input0_data = input0_data.reshape([3, -1] + list(input0_data.shape[-2:]))
    input1_data = input1_data.reshape([3, -1] + list(input1_data.shape[-2:]))
    input2_data = input2_data.reshape([3, -1] + list(input2_data.shape[-2:]))

    input_data = list()
    input_data.append(np.transpose(input0_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input1_data, (2, 3, 0, 1)))
    input_data.append(np.transpose(input2_data, (2, 3, 0, 1)))

    boxes, classes, scores = yolov5_post_process(input_data)
    if image_process_mode == "resize":
        scale_h = IMG_SIZE / img_height
        scale_w = IMG_SIZE / img_width
        boxes = resize_postprocess(boxes, scale_w, scale_h)
    elif image_process_mode == "letter_box":
        boxes = letter_box_postprocess(boxes, scale_factor[0], correction)

    # img_1 = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    if boxes is not None:
        draw(image, boxes, scores, classes)
    cv2.imwrite("image.jpg", image)
    rknn.release()

测试结果

(rknnpy36) ubuntu@ubuntu:~/Downloads/rknn-toolkit-1.7.1/examples/onnx/yolov5$ python3 rknndetect.py 
image_process_mode =  letter_box
--> Loading model
done
--> Init runtime environment
librknn_runtime version 1.7.1 (bd41dbc build: 2021-10-28 16:15:23 base: 1131)
done
--> Running model
the inference time:  57.128653 s
class: person, score: 0.9981504082679749
box coordinate left,top,right,down: [479.5712426304817, 257.1673904657364, 560.4287573695183, 516.2189620733261]
class: person, score: 0.9963428378105164
box coordinate left,top,right,down: [109.2795844823122, 229.37324756383896, 219.56947557628155, 531.149701654911]
class: person, score: 0.9797273874282837
box coordinate left,top,right,down: [211.11694836616516, 237.28168976306915, 286.2610423564911, 507.9288204908371]
class: bus, score: 0.9902657866477966
box coordinate left,top,right,down: [90.29679995775223, 137.56904834508896, 552.5043475031853, 439.34550243616104]

第七步：移植rk3399pro开发板，将PC端代码拷贝到开发板上

(rknnpy37) firefly@firefly:~/sxj731533730$ ls
bus.jpg   rknndetect.py  yolov5s_20220823.rknn

修改文件rknndetect.py，更改为

#from rknn.api import RKNN
from rknnlite.api import RKNNLite as RKNN

测试结果在0.12s=120ms真的快，jetson nano 测试yolov5才在178ms左右

(rknnpy37) firefly@firefly:~/sxj731533730$ python3 rknndetect.py 
image_process_mode =  letter_box
--> Loading model
done
--> Init runtime environment
I NPUTransfer: Starting NPU Transfer Client, Transfer version 2.1.0 (b5861e7@2020-11-23T11:50:51)
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.7.1 (566a9b6 build: 2021-10-28 14:56:17)
D RKNNAPI:   DRV: 1.7.1 (0cfd4a1 build: 2021-12-10 09:43:11)
D RKNNAPI: ==============================================
done
--> Running model
rknndetect.py:284: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  start = time.clock()
rknndetect.py:286: DeprecationWarning: time.clock has been deprecated in Python 3.3 and will be removed from Python 3.8: use time.perf_counter or time.process_time instead
  end = time.clock()
the inference time:  0.12555000000000005 s
class: person, score: 0.9981504082679749
box coordinate left,top,right,down: [479.5712496638298, 257.1674188375473, 560.4287503361702, 516.2189337015152]
class: person, score: 0.9963428378105164
box coordinate left,top,right,down: [109.27956780791283, 229.37324756383896, 219.56949225068092, 531.149701654911]
class: person, score: 0.9822962880134583
box coordinate left,top,right,down: [209.4318464398384, 245.27428996562958, 287.1548539996147, 507.3123799562454]
class: bus, score: 0.990969717502594
box coordinate left,top,right,down: [90.29679995775223, 138.5123466849327, 552.5043475031853, 440.2888007760048]

测试图片

第八步：开始仿写C++的npu检测代码，主要参考官方提供的demo，GitHub – airockchip/rknn_model_zoo官方使用的是darknetyolo 那种读图片方式，我改成了opencv读取方式

测试原图

目录,其中so和头文件来自官方提供的.h和so 第一个图是测试u8 第二个图测试fp

camkelists.txt

cmake_minimum_required(VERSION 3.16)
project(untitled10)
set(CMAKE_CXX_FLAGS "-std=c++11")
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -fopenmp ")
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fopenmp")

include_directories(${CMAKE_SOURCE_DIR})
include_directories(${CMAKE_SOURCE_DIR}/include)
find_package(OpenCV REQUIRED)
#message(STATUS ${OpenCV_INCLUDE_DIRS})
#添加头文件
include_directories(${OpenCV_INCLUDE_DIRS})
#链接Opencv库
add_library(librknn_api SHARED IMPORTED)
set_target_properties(librknn_api PROPERTIES IMPORTED_LOCATION ${CMAKE_SOURCE_DIR}/lib64/librknn_api.so)


add_executable(untitled10 main.cpp)
target_link_libraries(untitled10 ${OpenCV_LIBS} librknn_api )

测试代码

#include <stdio.h>
#include <stdint.h>
#include <stdlib.h>
#include <queue>
#include "rknn_api.h"
#include "opencv2/core/core.hpp"
#include "opencv2/imgproc/imgproc.hpp"
#include "opencv2/highgui/highgui.hpp"
#include <chrono>
#define OBJ_NAME_MAX_SIZE 16
#define OBJ_NUMB_MAX_SIZE 200
#define OBJ_CLASS_NUM     80
#define PROP_BOX_SIZE     (5+OBJ_CLASS_NUM)
using namespace std;

typedef struct _BOX_RECT {
    int left;
    int right;
    int top;
    int bottom;
} BOX_RECT;

typedef struct __detect_result_t {
    char name[OBJ_NAME_MAX_SIZE];
    int class_index;
    BOX_RECT box;
    float prop;
} detect_result_t;

typedef struct _detect_result_group_t {
    int id;
    int count;
    detect_result_t results[OBJ_NUMB_MAX_SIZE];
} detect_result_group_t;

const int anchor0[6] = {10, 13, 16, 30, 33, 23};
const int anchor1[6] = {30, 61, 62, 45, 59, 119};
const int anchor2[6] = {116, 90, 156, 198, 373, 326};

void printRKNNTensor(rknn_tensor_attr *attr) {
    printf("index=%d name=%s n_dims=%d dims=[%d %d %d %d] n_elems=%d size=%d "
           "fmt=%d type=%d qnt_type=%d fl=%d zp=%d scale=%f\n",
           attr->index, attr->name, attr->n_dims, attr->dims[3], attr->dims[2],
           attr->dims[1], attr->dims[0], attr->n_elems, attr->size, 0, attr->type,
           attr->qnt_type, attr->fl, attr->zp, attr->scale);
}

float sigmoid(float x) {
    return 1.0 / (1.0 + expf(-x));
}

float unsigmoid(float y) {
    return -1.0 * logf((1.0 / y) - 1.0);
}

int process_fp(float *input, int *anchor, int grid_h, int grid_w, int height, int width, int stride,
               std::vector<float> &boxes, std::vector<float> &boxScores, std::vector<int> &classId,
               float threshold) {

    int validCount = 0;
    int grid_len = grid_h * grid_w;
    float thres_sigmoid = unsigmoid(threshold);
    for (int a = 0; a < 3; a++) {
        for (int i = 0; i < grid_h; i++) {
            for (int j = 0; j < grid_w; j++) {
                float box_confidence = input[(PROP_BOX_SIZE * a + 4) * grid_len + i * grid_w + j];
                if (box_confidence >= thres_sigmoid) {
                    int offset = (PROP_BOX_SIZE * a) * grid_len + i * grid_w + j;
                    float *in_ptr = input + offset;
                    float box_x = sigmoid(*in_ptr) * 2.0 - 0.5;
                    float box_y = sigmoid(in_ptr[grid_len]) * 2.0 - 0.5;
                    float box_w = sigmoid(in_ptr[2 * grid_len]) * 2.0;
                    float box_h = sigmoid(in_ptr[3 * grid_len]) * 2.0;
                    box_x = (box_x + j) * (float) stride;
                    box_y = (box_y + i) * (float) stride;
                    box_w = box_w * box_w * (float) anchor[a * 2];
                    box_h = box_h * box_h * (float) anchor[a * 2 + 1];
                    box_x -= (box_w / 2.0);
                    box_y -= (box_h / 2.0);
                    boxes.push_back(box_x);
                    boxes.push_back(box_y);
                    boxes.push_back(box_w);
                    boxes.push_back(box_h);

                    float maxClassProbs = in_ptr[5 * grid_len];
                    int maxClassId = 0;
                    for (int k = 1; k < OBJ_CLASS_NUM; ++k) {
                        float prob = in_ptr[(5 + k) * grid_len];
                        if (prob > maxClassProbs) {
                            maxClassId = k;
                            maxClassProbs = prob;
                        }
                    }
                    float box_conf_f32 = sigmoid(box_confidence);
                    float class_prob_f32 = sigmoid(maxClassProbs);
                    boxScores.push_back(box_conf_f32 * class_prob_f32);
                    classId.push_back(maxClassId);
                    validCount++;
                }
            }
        }
    }
    return validCount;
}

float CalculateOverlap(float xmin0, float ymin0, float xmax0, float ymax0, float xmin1, float ymin1, float xmax1,
                       float ymax1) {
    float w = fmax(0.f, fmin(xmax0, xmax1) - fmax(xmin0, xmin1) + 1.0);
    float h = fmax(0.f, fmin(ymax0, ymax1) - fmax(ymin0, ymin1) + 1.0);
    float i = w * h;
    float u = (xmax0 - xmin0 + 1.0) * (ymax0 - ymin0 + 1.0) + (xmax1 - xmin1 + 1.0) * (ymax1 - ymin1 + 1.0) - i;
    return u <= 0.f ? 0.f : (i / u);
}

int nms(int validCount, std::vector<float> &outputLocations, std::vector<int> &order, float threshold) {
    for (int i = 0; i < validCount; ++i) {
        if (order[i] == -1) {
            continue;
        }
        int n = order[i];
        for (int j = i + 1; j < validCount; ++j) {
            int m = order[j];
            if (m == -1) {
                continue;
            }
            float xmin0 = outputLocations[n * 4 + 0];
            float ymin0 = outputLocations[n * 4 + 1];
            float xmax0 = outputLocations[n * 4 + 0] + outputLocations[n * 4 + 2];
            float ymax0 = outputLocations[n * 4 + 1] + outputLocations[n * 4 + 3];

            float xmin1 = outputLocations[m * 4 + 0];
            float ymin1 = outputLocations[m * 4 + 1];
            float xmax1 = outputLocations[m * 4 + 0] + outputLocations[m * 4 + 2];
            float ymax1 = outputLocations[m * 4 + 1] + outputLocations[m * 4 + 3];

            float iou = CalculateOverlap(xmin0, ymin0, xmax0, ymax0, xmin1, ymin1, xmax1, ymax1);

            if (iou > threshold) {
                order[j] = -1;
            }
        }
    }
    return 0;
}

int quick_sort_indice_inverse(
        std::vector<float> &input,
        int left,
        int right,
        std::vector<int> &indices) {
    float key;
    int key_index;
    int low = left;
    int high = right;
    if (left < right) {
        key_index = indices[left];
        key = input[left];
        while (low < high) {
            while (low < high && input[high] <= key) {
                high--;
            }
            input[low] = input[high];
            indices[low] = indices[high];
            while (low < high && input[low] >= key) {
                low++;
            }
            input[high] = input[low];
            indices[high] = indices[low];
        }
        input[low] = key;
        indices[low] = key_index;
        quick_sort_indice_inverse(input, left, low - 1, indices);
        quick_sort_indice_inverse(input, low + 1, right, indices);
    }
    return low;
}

int clamp(float val, int min, int max) {
    return val > min ? (val < max ? val : max) : min;
}

int post_process_fp(float *input0, float *input1, float *input2, int model_in_h, int model_in_w,
                    int h_offset, int w_offset, float resize_scale, float conf_threshold, float nms_threshold,
                    detect_result_group_t *group, const char *labels[]) {
    memset(group, 0, sizeof(detect_result_group_t));
    std::vector<float> filterBoxes;
    std::vector<float> boxesScore;
    std::vector<int> classId;
    int stride0 = 8;
    int grid_h0 = model_in_h / stride0;
    int grid_w0 = model_in_w / stride0;
    int validCount0 = 0;
    validCount0 = process_fp(input0, (int *) anchor0, grid_h0, grid_w0, model_in_h, model_in_w,
                             stride0, filterBoxes, boxesScore, classId, conf_threshold);

    int stride1 = 16;
    int grid_h1 = model_in_h / stride1;
    int grid_w1 = model_in_w / stride1;
    int validCount1 = 0;
    validCount1 = process_fp(input1, (int *) anchor1, grid_h1, grid_w1, model_in_h, model_in_w,
                             stride1, filterBoxes, boxesScore, classId, conf_threshold);

    int stride2 = 32;
    int grid_h2 = model_in_h / stride2;
    int grid_w2 = model_in_w / stride2;
    int validCount2 = 0;
    validCount2 = process_fp(input2, (int *) anchor2, grid_h2, grid_w2, model_in_h, model_in_w,
                             stride2, filterBoxes, boxesScore, classId, conf_threshold);

    int validCount = validCount0 + validCount1 + validCount2;
    // no object detect
    if (validCount <= 0) {
        return 0;
    }

    std::vector<int> indexArray;
    for (int i = 0; i < validCount; ++i) {
        indexArray.push_back(i);
    }

    quick_sort_indice_inverse(boxesScore, 0, validCount - 1, indexArray);

    nms(validCount, filterBoxes, indexArray, nms_threshold);

    int last_count = 0;
    /* box valid detect target */
    for (int i = 0; i < validCount; ++i) {

        if (indexArray[i] == -1 || boxesScore[i] < conf_threshold || last_count >= OBJ_NUMB_MAX_SIZE) {
            continue;
        }
        int n = indexArray[i];

        float x1 = filterBoxes[n * 4 + 0];
        float y1 = filterBoxes[n * 4 + 1];
        float x2 = x1 + filterBoxes[n * 4 + 2];
        float y2 = y1 + filterBoxes[n * 4 + 3];
        int id = classId[n];

        group->results[last_count].box.left = (int) ((clamp(x1, 0, model_in_w) - w_offset) / resize_scale);
        group->results[last_count].box.top = (int) ((clamp(y1, 0, model_in_h) - h_offset) / resize_scale);
        group->results[last_count].box.right = (int) ((clamp(x2, 0, model_in_w) - w_offset) / resize_scale);
        group->results[last_count].box.bottom = (int) ((clamp(y2, 0, model_in_h) - h_offset) / resize_scale);
        group->results[last_count].prop = boxesScore[i];
        group->results[last_count].class_index = id;
        const char *label = labels[id];
        strncpy(group->results[last_count].name, label, OBJ_NAME_MAX_SIZE);

        // printf("result %2d: (%4d, %4d, %4d, %4d), %s\n", i, group->results[last_count].box.left, group->results[last_count].box.top,
        //        group->results[last_count].box.right, group->results[last_count].box.bottom, label);
        last_count++;
    }
    group->count = last_count;

    return 0;
}

float deqnt_affine_to_f32(uint8_t qnt, uint8_t zp, float scale) {
    return ((float) qnt - (float) zp) * scale;
}

int32_t __clip(float val, float min, float max) {
    float f = val <= min ? min : (val >= max ? max : val);
    return f;
}

uint8_t qnt_f32_to_affine(float f32, uint8_t zp, float scale) {
    float dst_val = (f32 / scale) + zp;
    uint8_t res = (uint8_t) __clip(dst_val, 0, 255);
    return res;
}

int process_u8(uint8_t *input, int *anchor, int grid_h, int grid_w, int height, int width, int stride,
               std::vector<float> &boxes, std::vector<float> &boxScores, std::vector<int> &classId,
               float threshold, uint8_t zp, float scale) {

    int validCount = 0;
    int grid_len = grid_h * grid_w;
    float thres = unsigmoid(threshold);
    uint8_t thres_u8 = qnt_f32_to_affine(thres, zp, scale);
    for (int a = 0; a < 3; a++) {
        for (int i = 0; i < grid_h; i++) {
            for (int j = 0; j < grid_w; j++) {
                uint8_t box_confidence = input[(PROP_BOX_SIZE * a + 4) * grid_len + i * grid_w + j];
                if (box_confidence >= thres_u8) {
                    int offset = (PROP_BOX_SIZE * a) * grid_len + i * grid_w + j;
                    uint8_t *in_ptr = input + offset;
                    float box_x = sigmoid(deqnt_affine_to_f32(*in_ptr, zp, scale)) * 2.0 - 0.5;
                    float box_y = sigmoid(deqnt_affine_to_f32(in_ptr[grid_len], zp, scale)) * 2.0 - 0.5;
                    float box_w = sigmoid(deqnt_affine_to_f32(in_ptr[2 * grid_len], zp, scale)) * 2.0;
                    float box_h = sigmoid(deqnt_affine_to_f32(in_ptr[3 * grid_len], zp, scale)) * 2.0;
                    box_x = (box_x + j) * (float) stride;
                    box_y = (box_y + i) * (float) stride;
                    box_w = box_w * box_w * (float) anchor[a * 2];
                    box_h = box_h * box_h * (float) anchor[a * 2 + 1];
                    box_x -= (box_w / 2.0);
                    box_y -= (box_h / 2.0);
                    boxes.push_back(box_x);
                    boxes.push_back(box_y);
                    boxes.push_back(box_w);
                    boxes.push_back(box_h);

                    uint8_t maxClassProbs = in_ptr[5 * grid_len];
                    int maxClassId = 0;
                    for (int k = 1; k < OBJ_CLASS_NUM; ++k) {
                        uint8_t prob = in_ptr[(5 + k) * grid_len];
                        if (prob > maxClassProbs) {
                            maxClassId = k;
                            maxClassProbs = prob;
                        }
                    }
                    float box_conf_f32 = sigmoid(deqnt_affine_to_f32(box_confidence, zp, scale));
                    float class_prob_f32 = sigmoid(deqnt_affine_to_f32(maxClassProbs, zp, scale));
                    boxScores.push_back(box_conf_f32 * class_prob_f32);
                    classId.push_back(maxClassId);
                    validCount++;
                }
            }
        }
    }
    return validCount;
}

int post_process_u8(uint8_t *input0, uint8_t *input1, uint8_t *input2, int model_in_h, int model_in_w,
                    int h_offset, int w_offset, float resize_scale, float conf_threshold, float nms_threshold,
                    std::vector<uint8_t> &qnt_zps, std::vector<float> &qnt_scales,
                    detect_result_group_t *group, const char *labels[]) {

    memset(group, 0, sizeof(detect_result_group_t));

    std::vector<float> filterBoxes;
    std::vector<float> boxesScore;
    std::vector<int> classId;
    int stride0 = 8;
    int grid_h0 = model_in_h / stride0;
    int grid_w0 = model_in_w / stride0;
    int validCount0 = 0;
    validCount0 = process_u8(input0, (int *) anchor0, grid_h0, grid_w0, model_in_h, model_in_w,
                             stride0, filterBoxes, boxesScore, classId, conf_threshold, qnt_zps[0], qnt_scales[0]);

    int stride1 = 16;
    int grid_h1 = model_in_h / stride1;
    int grid_w1 = model_in_w / stride1;
    int validCount1 = 0;
    validCount1 = process_u8(input1, (int *) anchor1, grid_h1, grid_w1, model_in_h, model_in_w,
                             stride1, filterBoxes, boxesScore, classId, conf_threshold, qnt_zps[1], qnt_scales[1]);

    int stride2 = 32;
    int grid_h2 = model_in_h / stride2;
    int grid_w2 = model_in_w / stride2;
    int validCount2 = 0;
    validCount2 = process_u8(input2, (int *) anchor2, grid_h2, grid_w2, model_in_h, model_in_w,
                             stride2, filterBoxes, boxesScore, classId, conf_threshold, qnt_zps[2], qnt_scales[2]);

    int validCount = validCount0 + validCount1 + validCount2;
    // no object detect
    if (validCount <= 0) {
        return 0;
    }

    std::vector<int> indexArray;
    for (int i = 0; i < validCount; ++i) {
        indexArray.push_back(i);
    }

    quick_sort_indice_inverse(boxesScore, 0, validCount - 1, indexArray);

    nms(validCount, filterBoxes, indexArray, nms_threshold);

    int last_count = 0;
    group->count = 0;
    /* box valid detect target */
    for (int i = 0; i < validCount; ++i) {

        if (indexArray[i] == -1 || boxesScore[i] < conf_threshold || last_count >= OBJ_NUMB_MAX_SIZE) {
            continue;
        }
        int n = indexArray[i];

        float x1 = filterBoxes[n * 4 + 0];
        float y1 = filterBoxes[n * 4 + 1];
        float x2 = x1 + filterBoxes[n * 4 + 2];
        float y2 = y1 + filterBoxes[n * 4 + 3];
        int id = classId[n];

        group->results[last_count].box.left = (int) ((clamp(x1, 0, model_in_w) - w_offset) / resize_scale);
        group->results[last_count].box.top = (int) ((clamp(y1, 0, model_in_h) - h_offset) / resize_scale);
        group->results[last_count].box.right = (int) ((clamp(x2, 0, model_in_w) - w_offset) / resize_scale);
        group->results[last_count].box.bottom = (int) ((clamp(y2, 0, model_in_h) - h_offset) / resize_scale);
        group->results[last_count].prop = boxesScore[i];
        group->results[last_count].class_index = id;
        const char *label = labels[id];
        strncpy(group->results[last_count].name, label, OBJ_NAME_MAX_SIZE);

        // printf("result %2d: (%4d, %4d, %4d, %4d), %s\n", i, group->results[last_count].box.left, group->results[last_count].box.top,
        //        group->results[last_count].box.right, group->results[last_count].box.bottom, label);
        last_count++;
    }
    group->count = last_count;

    return 0;
}
void letterbox(cv::Mat rgb,cv::Mat &img_resize,int target_width,int target_height){

    float shape_0=rgb.rows;
    float shape_1=rgb.cols;
    float new_shape_0=target_height;
    float new_shape_1=target_width;
    float r=std::min(new_shape_0/shape_0,new_shape_1/shape_1);
    float new_unpad_0=int(round(shape_1*r));
    float new_unpad_1=int(round(shape_0*r));
    float dw=new_shape_1-new_unpad_0;
    float dh=new_shape_0-new_unpad_1;
    dw=dw/2;
    dh=dh/2;
    cv::Mat copy_rgb=rgb.clone();
    if(int(shape_0)!=int(new_unpad_0)&&int(shape_1)!=int(new_unpad_1)){
        cv::resize(copy_rgb,img_resize,cv::Size(new_unpad_0,new_unpad_1));
        copy_rgb=img_resize;
    }
    int top=int(round(dh-0.1));
    int bottom=int(round(dh+0.1));
    int left=int(round(dw-0.1));
    int right=int(round(dw+0.1));
    cv::copyMakeBorder(copy_rgb, img_resize,top, bottom, left, right, cv::BORDER_CONSTANT, cv::Scalar(0,0,0));

}
int main(int argc, char **argv) {
    const char *img_path = "../0.jpeg";
    const char *model_path = "../yolov5s_20220823.rknn";
    const char *post_process_type = "fp";//fp
    const int target_width = 640;
    const int target_height = 640;
    const char *image_process_mode = "letter_box";
    float resize_scale = 0;
    int h_pad=0;
    int w_pad=0;
    const float nms_threshold = 0.6;
    const float conf_threshold = 0.25;
    const char *labels[] = {"person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat",
                            "traffic light",
                            "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat", "dog", "horse",
                            "sheep", "cow",
                            "elephant", "bear", "zebra", "giraffe", "backpack", "umbrella", "handbag", "tie",
                            "suitcase", "frisbee",
                            "skis", "snowboard", "sports ball", "kite", "baseball bat", "baseball glove",
                            "skateboard", "surfboard",
                            "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl",
                            "banana", "apple",
                            "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
                            "chair", "couch",
                            "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse", "remote",
                            "keyboard", "cell phone",
                            "microwave", "oven", "toaster", "sink", "refrigerator", "book", "clock", "vase",
                            "scissors", "teddy bear",
                            "hair drier", "toothbrush"};
    // Load image
    cv::Mat bgr = cv::imread(img_path);
    if (!bgr.data) {
        printf("cv::imread %s fail!\n", img_path);
        return -1;
    }
    cv::Mat rgb;
    //BGR->RGB
    cv::cvtColor(bgr, rgb, cv::COLOR_BGR2RGB);

    cv::Mat img_resize;
    float correction[2] = {0, 0};
    float scale_factor[] = {0, 0};
    int width=rgb.cols;
    int height=rgb.rows;
    // Letter box resize
    float img_wh_ratio = (float) width / (float) height;
    float input_wh_ratio = (float) target_width / (float) target_height;
    int resize_width;
    int resize_height;
    if (img_wh_ratio >= input_wh_ratio) {
        //pad height dim
        resize_scale = (float) target_width / (float) width;
        resize_width = target_width;
        resize_height = (int) ((float) height * resize_scale);
        w_pad = 0;
        h_pad = (target_height - resize_height) / 2;
    } else {
        //pad width dim
        resize_scale = (float) target_height / (float) height;
        resize_width = (int) ((float) width * resize_scale);
        resize_height = target_height;
        w_pad = (target_width - resize_width) / 2;;
        h_pad = 0;
    }
    if(strcmp(image_process_mode,"letter_box")==0){
    letterbox(rgb,img_resize,target_width,target_height);
    }else {
        cv::resize(rgb, img_resize, cv::Size(target_width, target_height));
    }
    // Load model
    FILE *fp = fopen(model_path, "rb");
    if (fp == NULL) {
        printf("fopen %s fail!\n", model_path);
        return -1;
    }
    fseek(fp, 0, SEEK_END);
    int model_len = ftell(fp);
    void *model = malloc(model_len);
    fseek(fp, 0, SEEK_SET);
    if (model_len != fread(model, 1, model_len, fp)) {
        printf("fread %s fail!\n", model_path);
        free(model);
        return -1;
    }


    rknn_context ctx = 0;

    int ret = rknn_init(&ctx, model, model_len, 0);
    if (ret < 0) {
        printf("rknn_init fail! ret=%d\n", ret);
        return -1;
    }

    /* Query sdk version */
    rknn_sdk_version version;
    ret = rknn_query(ctx, RKNN_QUERY_SDK_VERSION, &version,
                     sizeof(rknn_sdk_version));
    if (ret < 0) {
        printf("rknn_init error ret=%d\n", ret);
        return -1;
    }
    printf("sdk version: %s driver version: %s\n", version.api_version,
           version.drv_version);


    /* Get input,output attr */
    rknn_input_output_num io_num;
    ret = rknn_query(ctx, RKNN_QUERY_IN_OUT_NUM, &io_num, sizeof(io_num));
    if (ret < 0) {
        printf("rknn_init error ret=%d\n", ret);
        return -1;
    }
    printf("model input num: %d, output num: %d\n", io_num.n_input,
           io_num.n_output);

    rknn_tensor_attr input_attrs[io_num.n_input];
    memset(input_attrs, 0, sizeof(input_attrs));
    for (int i = 0; i < io_num.n_input; i++) {
        input_attrs[i].index = i;
        ret = rknn_query(ctx, RKNN_QUERY_INPUT_ATTR, &(input_attrs[i]),
                         sizeof(rknn_tensor_attr));
        if (ret < 0) {
            printf("rknn_init error ret=%d\n", ret);
            return -1;
        }
        printRKNNTensor(&(input_attrs[i]));
    }

    rknn_tensor_attr output_attrs[io_num.n_output];
    memset(output_attrs, 0, sizeof(output_attrs));
    for (int i = 0; i < io_num.n_output; i++) {
        output_attrs[i].index = i;
        ret = rknn_query(ctx, RKNN_QUERY_OUTPUT_ATTR, &(output_attrs[i]),
                         sizeof(rknn_tensor_attr));
        printRKNNTensor(&(output_attrs[i]));
    }

    int input_channel = 3;
    int input_width = 0;
    int input_height = 0;
    if (input_attrs[0].fmt == RKNN_TENSOR_NCHW) {
        printf("model is NCHW input fmt\n");
        input_width = input_attrs[0].dims[0];
        input_height = input_attrs[0].dims[1];
        printf("input_width=%d input_height=%d\n", input_width, input_height);
    } else {
        printf("model is NHWC input fmt\n");
        input_width = input_attrs[0].dims[1];
        input_height = input_attrs[0].dims[2];
        printf("input_width=%d input_height=%d\n", input_width, input_height);
    }

    printf("model input height=%d, width=%d, channel=%d\n", input_height, input_width,
           input_channel);


/* Init input tensor */
    rknn_input inputs[1];
    memset(inputs, 0, sizeof(inputs));
    inputs[0].index = 0;
    inputs[0].buf = img_resize.data;
    inputs[0].type = RKNN_TENSOR_UINT8;
    inputs[0].size = input_width * input_height * input_channel;
    inputs[0].fmt = RKNN_TENSOR_NHWC;
    inputs[0].pass_through = 0;

    /* Init output tensor */
    rknn_output outputs[io_num.n_output];
    memset(outputs, 0, sizeof(outputs));
    for (int i = 0; i < io_num.n_output; i++) {
        if (strcmp(post_process_type, "fp") == 0) {
            outputs[i].want_float = 1;
        } else if (strcmp(post_process_type, "u8") == 0) {
            outputs[i].want_float = 0;
        }
    }
    printf("img.cols: %d, img.rows: %d\n", img_resize.cols, img_resize.rows);
    auto t1=std::chrono::steady_clock::now();
    rknn_inputs_set(ctx, io_num.n_input, inputs);
    ret = rknn_run(ctx, NULL);
    if (ret < 0) {
        printf("ctx error ret=%d\n", ret);
        return -1;
    }
    ret = rknn_outputs_get(ctx, io_num.n_output, outputs, NULL);
    if (ret < 0) {
        printf("outputs error ret=%d\n", ret);
        return -1;
    }
    /* Post process */
    std::vector<float> out_scales;
    std::vector<uint8_t> out_zps;
    for (int i = 0; i < io_num.n_output; ++i) {
        out_scales.push_back(output_attrs[i].scale);
        out_zps.push_back(output_attrs[i].zp);
    }

    detect_result_group_t detect_result_group;
    if (strcmp(post_process_type, "u8") == 0) {
        post_process_u8((uint8_t *) outputs[0].buf, (uint8_t *) outputs[1].buf, (uint8_t *) outputs[2].buf,
                        input_height, input_width,
                        h_pad, w_pad, resize_scale, conf_threshold, nms_threshold, out_zps, out_scales,
                        &detect_result_group, labels);
    } else if (strcmp(post_process_type, "fp") == 0) {
        post_process_fp((float *) outputs[0].buf, (float *) outputs[1].buf, (float *) outputs[2].buf, input_height,
                        input_width,
                        h_pad, w_pad, resize_scale, conf_threshold, nms_threshold, &detect_result_group, labels);
    }
//毫秒级
    auto t2=std::chrono::steady_clock::now();
    double dr_ms=std::chrono::duration<double,std::milli>(t2-t1).count();
    printf("%lf ms\n",dr_ms);


    for (int i = 0; i < detect_result_group.count; i++) {
        detect_result_t *det_result = &(detect_result_group.results[i]);
        printf("%s @ (%d %d %d %d) %f\n",
               det_result->name,
               det_result->box.left, det_result->box.top, det_result->box.right, det_result->box.bottom,
               det_result->prop);
        int bx1 = det_result->box.left;
        int by1 = det_result->box.top;
        int bx2 = det_result->box.right;
        int by2 = det_result->box.bottom;
        cv::rectangle(bgr, cv::Point(bx1, by1), cv::Point(bx2, by2), cv::Scalar(231, 232, 143));  //两点的方式
        char text[256];
        sprintf(text, "%s %.1f%% ", det_result->name, det_result->prop * 100);

        int baseLine = 0;
        cv::Size label_size = cv::getTextSize(text, cv::FONT_HERSHEY_SIMPLEX, 0.5, 1, &baseLine);

        int x = bx1;
        int y = by1 - label_size.height - baseLine;
        if (y < 0)
            y = 0;
        if (x + label_size.width > bgr.cols)
            x = bgr.cols - label_size.width;


        cv::rectangle(bgr, cv::Rect(cv::Point(x, y), cv::Size(label_size.width, label_size.height + baseLine)),
                      cv::Scalar(0, 0, 255), -1);

        cv::putText(bgr, text, cv::Point(x, y + label_size.height),
                    cv::FONT_HERSHEY_DUPLEX, 0.4, cv::Scalar(255, 255, 255), 1, cv::LINE_AA);

        cv::imwrite("bgr9.jpg", bgr);
    }


    ret = rknn_outputs_release(ctx, io_num.n_output, outputs);

    if (ret < 0) {
        printf("rknn_query fail! ret=%d\n", ret);
        goto Error;
    }


    Error:
    if (ctx > 0)
        rknn_destroy(ctx);
    if (model)
        free(model);
    if (fp)
        fclose(fp);
    return 0;
}

测试rk3399 pro npu结果图，时间大概在120-150ms左右

/home/firefly/sxj731533730/cmake-build-debug/untitled10
D RKNNAPI: ==============================================
D RKNNAPI: RKNN VERSION:
D RKNNAPI:   API: 1.6.1 (00c4d8b build: 2021-03-15 16:31:37)
D RKNNAPI:   DRV: 1.7.1 (0cfd4a1 build: 2021-12-10 09:43:11)
D RKNNAPI: ==============================================
sdk version: 1.6.1 (00c4d8b build: 2021-03-15 16:31:37) driver version: 1.7.1 (0cfd4a1 build: 2021-12-10 09:43:11)
model input num: 1, output num: 3
index=0 name=images_220 n_dims=4 dims=[1 3 640 640] n_elems=1228800 size=2457600 fmt=0 type=1 qnt_type=0 fl=0 zp=0 scale=1.000000
index=0 name=Reshape_Reshape_259/out0_0 n_dims=5 dims=[3 85 80 80] n_elems=1632000 size=1632000 fmt=0 type=3 qnt_type=2 fl=0 zp=210 scale=0.092896
index=1 name=Reshape_Reshape_274/out0_1 n_dims=5 dims=[3 85 40 40] n_elems=408000 size=408000 fmt=0 type=3 qnt_type=2 fl=0 zp=182 scale=0.086178
index=2 name=Reshape_Reshape_289/out0_2 n_dims=5 dims=[3 85 20 20] n_elems=102000 size=102000 fmt=0 type=3 qnt_type=2 fl=0 zp=179 scale=0.075776
model is NCHW input fmt
input_width=640 input_height=640
model input height=640, width=640, channel=3
img.cols: 640, img.rows: 640
120.217066 ms
dog @ (278 23 459 221) 0.530085
cat @ (177 159 608 395) 0.498268
chair @ (0 5 93 103) 0.270487

结果图片