# 创建虚拟环境, 建议不要创建在项目路径下, 包括数据集也是不要放在项目路径下, 不然pycharm可能会去读这些内容, 可能会很费时间
# PyTorch当时最高支持Python3.7-3.9, Conda当时最高支持3.9, 当时base环境的Python是3.9.12, 自动下载的Python是3.9.13
# 创建虚拟环境时要指定Python版本, 且不同于base环境中的Python版本, 不然不会真的创建虚拟环境, 且会污染base环境, 恶心 ...
conda create -n yolo python=3.9 # -n和-p不能同时设置 ...
# 激活虚拟环境. 我猜 -n 其实就是 -p 的特殊版本, 相当于指定了路径前缀 conda\envs, 两者其实是一样的
conda activate yolo
# cd 到项目路径, 执行安装依赖包命令
cd C:\mrathena\develop\workspace\pycharm\python.yolov5.starter
# 安装依赖包, 注意, 这里不要开代理工具, 不然可能失败
pip install -r requirements.txt
# 安装成功

直接运行 detect.py, 运行结果, (这里我添加了一张有人的图片看检测效果)

C:\mrathena\develop\miniconda\envs\yolov5\python.exe C:/mrathena/develop/workspace/pycharm/yolov5-6.2/detect.py
detect: weights=yolov5s.pt, source=data\images, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=False, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False
YOLOv5  2022-8-17 Python-3.9.13 torch-1.12.1+cpu CPU

Fusing layers... 
YOLOv5s summary: 213 layers, 7225885 parameters, 0 gradients
image 1/3 C:\mrathena\develop\workspace\pycharm\yolov5-6.2\data\images\bus.jpg: 640x480 4 persons, 1 bus, Done. (0.173s)
image 2/3 C:\mrathena\develop\workspace\pycharm\yolov5-6.2\data\images\people.jpeg: 608x640 4 persons, 1 handbag, Done. (0.202s)
image 3/3 C:\mrathena\develop\workspace\pycharm\yolov5-6.2\data\images\zidane.jpg: 384x640 2 persons, 2 ties, Done. (0.130s)
Speed: 0.3ms pre-process, 168.2ms inference, 3.7ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp

PyTorch

YOLOv5  2022-8-17 Python-3.9.13 torch-1.12.1+cpu CPU

默认情况下, Yolo是以CPU的方式运行的, 我们要改成GPU的方式, 这样训练和检测会更快

好像先决条件是你的电脑装有包含 CUDA 核心的 Nvidia 显卡, 我这边是 Nvidia 2080

但是网上的教程真的是五花八门, 推荐下面这篇, 凭我的经验, 我觉得更可信

一文搞懂PyTorch与CUDA那些事

总结下来就是, 安装 PyTorch 不需要电脑上有 CUDA 运行环境, 因为安装时会自动下载, 确保 CUDA 和显卡驱动版本对应就可以了

这里的表3 就是 CUDA 和显卡驱动的关系, 新显卡驱动向前兼容旧的CUDA, 只要显卡驱动版本大于等于516.94, 那就可以跑CUDA11.7.1, 大于等于516.01, 就可以跑11.7, 大于等于511.23, 就可以跑11.6, 大于等于465.89, 就可以跑11.3.0, 以此类推

我的显卡驱动版本是516.94, 那它就支持所有版本的 CUDA, 先到 PyTorch 官网看看, 最新的是 11.6, 那就下它了

PyTorch 官网

conda install pytorch torchvision torchaudio cudatoolkit=11.6 -c pytorch -c conda-forge

安装好后, 虚拟环境从 1.3G 变成了 5.8G, 真的是 … 为什么不能搞一套类似Maven的项目管理工具呢 …

跟上官网的教程检查是否安装成功, 执行 python, 输入

import torch
torch.cuda.is_available()

看样子是 CUDA 可以使用了, 然后再试跑一下 detect.py, 应该是已经使用显卡在跑了

YOLOv5  2022-8-17 Python-3.9.13 torch-1.12.1 CUDA:0 (NVIDIA GeForce RTX 2080, 8192MiB)

第一阶段使用自带模型实现实时目标检测

实时屏幕截图

Windows桌面采集技术
 D3DShot
D3DShot Issues#44 Bump pillow version for Python 3.9 support on Windows

传统的 GDI 截屏感觉性能不大行, 尤其是大屏高分辨率的情况下, 所以我将尝试 DXGI 截屏, 使用 D3DShot 模块, 其默认使用 RGB 模式

我的配置是 AMD R7 2700X, Nvidia 2080(8G), 3440*1440, 测试下来, 三种截全屏耗时大约 D3DShot:21ms, MSS:41ms, Win32:52ms

DXGI D3DShot

安装 D3DShot 的时候不太顺利, 因为 D3DShot 在 Python 3.9 里会和 pillow 版本冲突, 所以使用大佬修复过的版本来替代

pip install git+https://github.com/fauskanger/D3DShot#egg=D3DShot

import time

import cv2
import d3dshot
import win32con
import win32gui

dxgi = d3dshot.create(capture_output="numpy")


# 测试下来, mss 的比 win 的快几毫秒
times = 100
begin = time.perf_counter()
for i in range(times):
    img = dxgi.screenshot()
print(f'总耗时:{int((time.perf_counter() - begin) * 1000)}ms, 平均耗时:{int((time.perf_counter() - begin) / times * 1000)}ms')


title = 'Realtime Screenshot'
while True:
    img = dxgi.screenshot()
    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    cv2.namedWindow(title, cv2.WINDOW_NORMAL)
    cv2.resizeWindow(title, 1050, 450)
    cv2.imshow(title, img)
    # 寻找窗口, 设置置顶
    hwnd = win32gui.FindWindow(None, title)
    # win32gui.ShowWindow(hwnd, win32con.SW_SHOWNORMAL)
    win32gui.SetWindowPos(hwnd, win32con.HWND_TOPMOST, 0, 0, 0, 0, win32con.SWP_NOMOVE | win32con.SWP_NOACTIVATE | win32con.SWP_NOOWNERZORDER | win32con.SWP_SHOWWINDOW | win32con.SWP_NOSIZE)

    k = cv2.waitKey(1)  # 0:不自动销毁, 1:1ms延迟销毁
    if k % 256 == 27:
        cv2.destroyAllWindows()
        exit('ESC ...')

如果出现 cv2 没有提示, 无法点击的问题, 做如下操作

GDI MSS 和 Win32

主要使用 windows 自带的 gdi32.dll 中的函数完成截图, 可以直接使用 ctypes, 也可以使用 pywin32 包(对函数和常量的封装比较友好)

使用 pywin32 需要先安装该包, 建议使用 conda 来安装. 我这边开始是使用 pip 安装的, 其中的 win32ui 部分始终不能正确导入, 后来换了 conda 重新安装后, 一切正常了, 所以如果安装包有问题, 可以试试 conda 安装, 毕竟 anaconda 都测试验证过包的内容

实时屏幕检测

YOLO 中, detect.py 的作用就是使用某个训练好的模型来检测图片, 所以它里面肯定有使用模型和检测图片的相关代码

需要从 detect.py 中找到核心代码, 实现输入一张图输出标好框的图, 后面可能改成输出检测结果矩形信息(getAims)

这里有一个小坑需要注意, 不要自己新建文件夹, 就和其他 yolo 文件放在一起, 通过文件名前缀来区分

第一阶段读取一张图片并检测目标然后输出结果

import cv2
import numpy as np
import torch

from models.common import DetectMultiBackend
from utils.dataloaders import letterbox
from utils.general import non_max_suppression, scale_coords
from utils.plots import Annotator, colors
from utils.torch_utils import select_device

# 拆解 detect.py 的代码, 实现 输入图片 输出推测图片 的效果

# 获取设备, cpu/cuda
device = select_device('')
print('获取设备: ' + device.type)  # cuda

# 获取模型
model = DetectMultiBackend('yolov5s.pt', device=device, dnn=False, data=None, fp16=False)
names = model.module.names if hasattr(model, 'module') else model.names  # get class names
print('获取模型: ' + model.weights)

# 获取图片
img0 = cv2.imread('data/images/bus.jpg')
# cv2.imshow('', img0)
# cv2.waitKey(0)
# cv2.destroyAllWindows()
print('输入图片')

# 拿到 dataset 的 im
im = letterbox(img0, [640, 640], stride=32, auto=True)[0]
im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
im = np.ascontiguousarray(im)
im = torch.from_numpy(im).to(device)
# im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
im = im.float()  # uint8 to fp16/32
im /= 255  # 0 - 255 to 0.0 - 1.0
if len(im.shape) == 3:
    im = im[None]  # expand for batch dim
print('处理图片')

# 推测
model.warmup(imgsz=(1, 3, *[640, 640]))  # warmup
pred = model(im, augment=False, visualize=False)
# pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)
print('推测结束')

for i, det in enumerate(pred):
    annotator = Annotator(img0, line_width=3, example=str(names))
    if len(det):
        det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img0.shape).round()
        for *xyxy, conf, cls in reversed(det):
            c = int(cls)  # integer class
            hide_labels = False
            hide_conf = False
            label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
            annotator.box_label(xyxy, label, color=colors(c, True))
    im0 = annotator.result()
    cv2.imshow('', im0)
    cv2.waitKey(0)  # 1 millisecond

第二阶段实时截图检测然后输出结果

将读取图片改写为实时截取全屏(3440*1440), 效率大概如下

总耗时:62ms, 截图:18ms, 转换:27ms, 推测:16ms
总耗时:61ms, 截图:17ms, 转换:26ms, 推测:16ms
总耗时:60ms, 截图:17ms, 转换:26ms, 推测:17ms
总耗时:61ms, 截图:18ms, 转换:25ms, 推测:16ms
总耗时:61ms, 截图:18ms, 转换:26ms, 推测:16ms
总耗时:62ms, 截图:17ms, 转换:27ms, 推测:16ms
总耗时:64ms, 截图:18ms, 转换:28ms, 推测:17ms
总耗时:59ms, 截图:17ms, 转换:26ms, 推测:14ms
总耗时:60ms, 截图:17ms, 转换:26ms, 推测:16ms

优化的话, 可以缩小截图范围, 也可以调整截图工具直接按BGR模式截取, 但是貌似不支持配置, 那RGB2BGR就是完全的无意义损耗

本来觉着省略颜色转换步骤应该也行, 结果发现这样做 yolo 报错, 需要执行 img = np.ascontiguousarray(img), 结果耗时差不多

import time

import cv2
import d3dshot
import numpy as np
import torch
import mss
import win32con
import win32gui

from models.common import DetectMultiBackend
from utils.augmentations import letterbox
from utils.general import non_max_suppression, scale_coords
from utils.plots import Annotator, colors
from utils.torch_utils import select_device


def loadModel(device, path):
    model = DetectMultiBackend(path, device=device, dnn=False, data=None, fp16=False)
    model.warmup(imgsz=(1, 3, *[640, 640]))  # warmup
    return model


def foo(device, img):
    # 拿到 dataset 的 im
    im = letterbox(img, [640, 640], stride=32, auto=True)[0]
    im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
    im = np.ascontiguousarray(im)
    im = torch.from_numpy(im).to(device)
    # im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
    im = im.float()  # uint8 to fp16/32
    im /= 255  # 0 - 255 to 0.0 - 1.0
    if len(im.shape) == 3:
        im = im[None]  # expand for batch dim
    return im


def detect(device, model, img):
    im = foo(device, img)
    pred = model(im, augment=False, visualize=False)
    # pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
    pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)

    names = model.module.names if hasattr(model, 'module') else model.names  # get class names
    det = pred[0]
    annotator = Annotator(img, line_width=3, example=str(names))
    if len(det):
        det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img.shape).round()
        for *xyxy, conf, cls in reversed(det):
            c = int(cls)  # integer class
            hide_labels = False
            hide_conf = False
            label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
            annotator.box_label(xyxy, label, color=colors(c, True))
    return annotator.result()


dxgi = d3dshot.create(capture_output="numpy")


def grab(region=None):
    """
    region: tuple, (left, top, width, height)
    pip install git+https://github.com/fauskanger/D3DShot#egg=D3DShot
    """
    if region:
        left, top, width, height = region
        return dxgi.screenshot((left, top, left + width, top + height))
    return dxgi.screenshot()


title = 'Realtime Screenshot'

# 获取设备, cpu/cuda
device = select_device('')
# 加载模型
# model = loadModel(device, 'D:\\resource\\develop\\python\\dataset.yolov5.6.2\\test\\runs\\train\\exp\\weights\\best.pt')
model = loadModel(device, '/yolov5s.pt')

# 网上下的模型
# https://www.youtube.com/watch?v=_QKDEI8uhQQ
# https://github.com/davidhoung2/APEX-yolov5-aim-assist
# model = loadModel(device, 'model.apex.1.pt')

while True:
    
    # 截图
    t1 = time.perf_counter_ns()
    img = grab()
    t2 = time.perf_counter_ns()
    img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
    t3 = time.perf_counter_ns()
    # 推测
    img = detect(device, model, img)
    t4 = time.perf_counter_ns()
    print(f'总耗时:{int((t4 - t1) // 1000 // 1000)}ms, 截图:{int((t2 - t1) // 1000 // 1000)}ms, 转换:{int((t3 - t2) // 1000 // 1000)}ms, 推测:{int((t4 - t3) // 1000 // 1000)}ms')

    cv2.namedWindow(title, cv2.WINDOW_NORMAL)
    cv2.resizeWindow(title, 2100, 900)
    cv2.imshow(title, img)

    # 寻找窗口, 设置置顶
    hwnd = win32gui.FindWindow(None, title)
    # win32gui.ShowWindow(hwnd, win32con.SW_SHOWNORMAL)
    win32gui.SetWindowPos(hwnd, win32con.HWND_TOPMOST, 0, 0, 0, 0, win32con.SWP_NOMOVE | win32con.SWP_NOACTIVATE | win32con.SWP_NOOWNERZORDER | win32con.SWP_SHOWWINDOW | win32con.SWP_NOSIZE)

    k = cv2.waitKey(1)
    if k % 256 == 27:
        cv2.destroyAllWindows()
        exit('ESC ...')

透明窗体绘制

使用 tkinter 做一个透明穿透的窗体覆盖在桌面最上层, 把检测信息传进去直接画框. 使用 pynput 键鼠监听来实现快捷键

在 toolkit.py 里 AimBot 类下的 detect 方法中涉及到将截图坐标系内的目标检测矩形转换成屏幕坐标系内的 xywh, 原理如下

toolkit.py

import ctypes
import time

import cv2
import d3dshot
import numpy as np
import torch

from models.common import DetectMultiBackend
from utils.augmentations import letterbox
from utils.general import non_max_suppression, scale_coords, xyxy2xywh
from utils.plots import Annotator, colors
from utils.torch_utils import select_device


user32 = ctypes.windll.user32
gdi32 = ctypes.windll.gdi32

dxgi = d3dshot.create(capture_output="numpy")


class Monitor:

    @staticmethod
    def grab(region=None):
        """
        region: tuple, (left, top, width, height)
        pip install git+https://github.com/fauskanger/D3DShot#egg=D3DShot
        """
        if region:
            left, top, width, height = region
            return dxgi.screenshot((left, top, left + width, top + height))
        return dxgi.screenshot()

    @staticmethod
    def resolution():
        """
        显示分辨率
        """
        w = user32.GetSystemMetrics(0)
        h = user32.GetSystemMetrics(1)
        return w, h


class Yolo:

    @staticmethod
    def loadModel(device, path):
        model = DetectMultiBackend(path, device=device, dnn=False, data=None, fp16=False)
        model.warmup(imgsz=(1, 3, *[640, 640]))  # warmup
        return model

    @staticmethod
    def foo(device, img):
        # 拿到 dataset 的 im
        im = letterbox(img, [640, 640], stride=32, auto=True)[0]
        im = im.transpose((2, 0, 1))[::-1]  # HWC to CHW, BGR to RGB
        im = np.ascontiguousarray(im)
        im = torch.from_numpy(im).to(device)
        # im = im.half() if model.fp16 else im.float()  # uint8 to fp16/32
        im = im.float()  # uint8 to fp16/32
        im /= 255  # 0 - 255 to 0.0 - 1.0
        if len(im.shape) == 3:
            im = im[None]  # expand for batch dim
        return im

    @staticmethod
    def inference(device, model, img):
        im = Yolo.foo(device, img)
        pred = model(im, augment=False, visualize=False)
        # pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
        pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)

        names = model.module.names if hasattr(model, 'module') else model.names  # get class names
        det = pred[0]
        annotator = Annotator(img, line_width=3, example=str(names))
        if len(det):
            det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img.shape).round()
            for *xyxy, conf, cls in reversed(det):
                c = int(cls)  # integer class
                hide_labels = False
                hide_conf = False
                label = None if hide_labels else (names[c] if hide_conf else f'{names[c]} {conf:.2f}')
                annotator.box_label(xyxy, label, color=colors(c, True))
        return annotator.result()


class AimBot:

    def __init__(self, region, model):
        # 屏幕宽高
        resolution = Monitor.resolution()
        self.sw = resolution[0]
        self.sh = resolution[1]
        # 截屏范围 region = (left, top, width, height)
        self.region = region
        self.gl = region[0]
        self.gt = region[1]
        self.gw = region[2]
        self.gh = region[3]
        # yolo
        self.device = select_device('')
        self.model = Yolo.loadModel(select_device(''), model)

    def detect(self):

        # 不会用, 尝试了下面第一种, 效果和不写差不多
        # with torch.no_grad():
        # @torch.no_grad

        # 截图
        t1 = time.perf_counter_ns()
        img = Monitor.grab(self.region)
        t2 = time.perf_counter_ns()
        img = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
        t3 = time.perf_counter_ns()
        # 检测
        im = Yolo.foo(self.device, img)
        t4 = time.perf_counter_ns()

        pred = self.model(im, augment=False, visualize=False)
        # pred = non_max_suppression(pred, conf_thres, iou_thres, classes, agnostic_nms, max_det=max_det)
        pred = non_max_suppression(pred, 0.25, 0.45, None, False, max_det=1000)

        det = pred[0]
        aims = []
        if len(det):
            names = self.model.module.names if hasattr(self.model, 'module') else self.model.names  # get class names
            gn = torch.tensor(img.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            det[:, :4] = scale_coords(im.shape[2:], det[:, :4], img.shape).round()

            for *xyxy, conf, cls in reversed(det):
                xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                c = int(cls)  # integer class

                label = f'{names[c]} {conf:.2f}'
                # 计算相对屏幕坐标系的点位
                left = self.gl + ((xywh[0] * self.gw) - (xywh[2] * self.gw) / 2)
                top = self.gt + ((xywh[1] * self.gh) - (xywh[3] * self.gh) / 2)
                width = xywh[2] * self.gw
                height = xywh[3] * self.gh

                aims.append([label, left, top, width, height])
        t3 = time.perf_counter()
        # print(f'截图:{int((t2 - t1) * 1000)}ms, 目标检测:{int((t3 - t2) * 1000)}ms, 目标数量:{len(aims)}, 总计:{int((t3 - t1) * 1000)}ms')
        return aims

test.detect.3.py

import multiprocessing
import time
import tkinter
from multiprocessing import Process
from threading import Thread

import pynput

end = 'end'
box = 'box'
aim = 'aim'
init = {
    end: False,  # 退出标记, End 键按下后改为 True, 其他进程线程在感知到变更后结束自身
    box: False,  # 显示开关
    aim: False,  # 自瞄开关
}


def mouse(data):

    def down(x, y, button, pressed):
        if pressed:
            if button == pynput.mouse.Button.x2:
                # 侧上键
                data = not data
                print(f'Switch ShowBox: {"enable" if data else "disable"}')
            elif button == pynput.mouse.Button.x1:
                # 侧下键
                data[aim] = not data[aim]
                print(f'Switch AutoAim: {"enable" if data[aim] else "disable"}')

    with pynput.mouse.Listener(on_click=down) as m:
        m.join()


def keyboard(data):

    def release(key):
        if key == pynput.keyboard.Key.end:
            # 结束程序
            data[end] = True
            return False

    with pynput.keyboard.Listener(on_release=release) as k:
        k.join()


def draw(canvas, x1, y1, x2, y2, width=2, color='red', text=None):
    canvas.create_rectangle(x1, y1, x2, y2, width=width, outline=color)
    if text is not None:
        canvas.create_rectangle(x1, y1 - 20, x2, y1, fill='black')
        canvas.create_text(x1, y1, anchor='sw', text=text, fill='yellow', font=('', 16))


def detect(data):

    print('加载模型')
    from toolkit import AimBot
    aimbot = AimBot((0, 0, 3440, 1440), 'yolov5s.pt')
    print('加载窗口')
    # 主程序
    TRANSCOLOUR = 'gray'
    root = tkinter.Tk()  # 创建
    root.attributes('-fullscreen', 1)  # 全屏
    root.attributes('-topmost', -1)  # 置顶
    root.wm_attributes('-transparentcolor', TRANSCOLOUR)  # 设置透明且穿透的颜色
    root['bg'] = TRANSCOLOUR  # 设置透明且穿透
    # 添加画布
    canvas = tkinter.Canvas(root, background=TRANSCOLOUR, borderwidth=0, highlightthickness=0)
    canvas.pack(fill=tkinter.BOTH, expand=tkinter.YES)
    canvas.pack()
    print('加载完成')

    def foo():
        while True:
            if data.get(end):
                break
            if (data.get(box) is False) & (data.get(aim) is False):
                continue
            t1 = time.perf_counter()
            canvas.delete(tkinter.ALL)
            t2 = time.perf_counter()
            aims = aimbot.detect()
            t3 = time.perf_counter()
            for item in aims:
                if data.get(box):
                    draw(canvas, item[1], item[2], item[1] + item[3], item[2] + item[4], 5, text=item[0])
                    # draw(canvas, item[1], item[2], item[1] + item[3], item[2] + item[4], 5)
            t4 = time.perf_counter()
            # 瞄准, 预留
            t5 = time.perf_counter()
            print(f'画布清理:{int((t2 - t1) * 1000)}ms, 目标检测:{int((t3 - t2) * 1000)}ms, 目标数量:{len(aims)}, 画框:{int((t4 - t3) * 1000)}ms, 瞄准:{int((t5 - t4) * 1000)}ms, 总计:{int((t5 - t1) * 1000)}ms, 画框开关:{data.get(box)}, 自瞄开关:{data.get(aim)}')

    t = Thread(target=foo)
    t.start()
    # 主循环
    root.mainloop()


if __name__ == '__main__':
    multiprocessing.freeze_support()  # windows 平台使用 multiprocessing 必须在 main 中第一行写这个
    manager = multiprocessing.Manager()
    data = manager.dict()  # 创建进程安全的共享变量
    data.update(init)  # 将初始数据导入到共享变量
    # 将键鼠监听和压枪放到单独进程中跑
    pm = Process(target=mouse, args=(data,))
    pk = Process(target=keyboard, args=(data,))
    pd = Process(target=detect, args=(data,))
    pm.start()
    pk.start()
    pd.start()
    pk.join()  # 不写 join 的话, 使用 dict 的地方就会报错 conn = self._tls.connection, AttributeError: 'ForkAwareLocal' object has no attribute 'connection'
    pm.terminate()  # 鼠标进程无法主动监听到终止信号, 所以需强制结束
    pd.terminate()

训练模型

注意: 不一定都得自己手动标, 标图有多种方式, 如纯手动, 半手动(应用某模型先检测再微调), 伪真实(拼接图片生成数据)等, 以下是纯手动

labelimg

labelimg 是在训练模型过程中用来标记目标的工具

在虚拟环境中安装 labelimg, 用于标记, 安装完成后执行 labelimg 会打开GUI界面

pip install labelimg

创建数据集文件夹, 我的数据集目录是 D:\resource\develop\python\dataset.yolov5.6.2, 本次训练集叫做 test, 所以在数据集下新建 test 目录

test 下创建 data/images 作为原始图片库, 创建 data/labels 作为标记信息库

然后在 labelimg 中设置好读取路径和保存路径, 开始标图

标好一张图后, 记得保存, 在 data/labels 目录下会自动生成 classes.txt 文件和图片对应的标记文件如 bus.txt

把其他图也标好, 下面是图与标记的对应, 注意图片最好不要有中文, 防止万一

classes.txt 与标记文件说明

classes.txt 中就是标记时分出来的两个类目, 这里一个是 head 一个是 body, 序号从0开始, 从上到下

标记文件中一行代表图片上的一个标记, 几行就是有几个标记

标记文件中每行有5个数据, 第一个是类目索引, 后面4个是归一化(把长宽看成是1,其他点等比缩小)的 xCenter, yCenter, width, height

简单尝试

编写数据集配置文件

拷贝项目下的 coco128.yaml 更名为 dataset.for.me.test.yaml 并修改内容

path: D:\resource\develop\python\dataset.yolov5.6.2\test
train: data/images  # train images (relative to 'path') 128 images
val: data/labels  # val images (relative to 'path') 128 images

# Classes
nc: 2  # number of classes
names: ['head', 'body']  # class names

path: 数据集根目录

train: 源图片目录

val:

nc: 标记的类别的数目

names: 标记的类别, classes.txt 文件从上到下按顺序一个个写过来

编写训练文件参数

拷贝项目下的 train.py 更名为 train.for.me.test.py 并修改 parse_opt 的内容

–weights: ROOT / ‘yolov5s.pt’. 可以选择是否基于某个模型训练, 全新训练就 default=‘’

–data: data/dataset.for.me.test.yaml

–batch-size: GPU模式下, 每次取这么多个参数跑, 如果报错, 可以改小点

–project: default=‘D:\resource\develop\python\dataset.yolov5.6.2\test\runs/train’, 训练结果保存位置

运行训练文件

运行报错

OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect results. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause crashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

搜索发现, miniconda 下有两个, 其他的有3个, 其他的应该不影响, 但是 moniconda 下为什么有两个, 我不知道, 该怎么处理, 我不知道, 但我觉得, 不知道不要瞎搞, 所以就按它说的不推荐的方式试试看吧

import os
os.environ['KMP_DUPLICATE_LIB_OK']='True'

如果运行报错

RuntimeError: DataLoader worker (pid(s) 20496) exited unexpectedly

把启动参数里的 --workers 改成 0 试试, 原因我不知道也不会看也看不懂

运行结果

C:\mrathena\develop\miniconda\envs\yolov5\python.exe C:/mrathena/develop/workspace/pycharm/yolov5-6.2/train.for.me.test.py
train.for.me.test: weights=yolov5s.pt, cfg=, data=data\dataset.for.me.test.yaml, hyp=data\hyps\hyp.scratch-low.yaml, epochs=300, batch_size=16, imgsz=640, rect=False, resume=False, nosave=False, noval=False, noautoanchor=False, noplots=False, evolve=None, bucket=, cache=None, image_weights=False, device=, multi_scale=False, single_cls=False, optimizer=SGD, sync_bn=False, workers=8, project=D:\resource\develop\python\dataset.yolov5.6.2\test\runs/train, name=exp, exist_ok=False, quad=False, cos_lr=False, label_smoothing=0.0, patience=100, freeze=[0], save_period=-1, seed=0, local_rank=-1, entity=None, upload_dataset=False, bbox_interval=-1, artifact_alias=latest
github:  YOLOv5 is out of date by 2326 commits. Use `git pull ultralytics master` or `git clone https://github.com/ultralytics/yolov5` to update.
YOLOv5  b899afe Python-3.9.13 torch-1.12.1 CUDA:0 (NVIDIA GeForce RTX 2080, 8192MiB)

hyperparameters: lr0=0.01, lrf=0.01, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.2, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=0.0, translate=0.1, scale=0.5, shear=0.0, perspective=0.0, flipud=0.0, fliplr=0.5, mosaic=1.0, mixup=0.0, copy_paste=0.0
Weights & Biases: run 'pip install wandb' to automatically track and visualize YOLOv5  runs in Weights & Biases
ClearML: run 'pip install clearml' to automatically track, visualize and remotely train YOLOv5  in ClearML
TensorBoard: Start with 'tensorboard --logdir D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train', view at http://localhost:6006/
Overriding model.yaml nc=80 with nc=2

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Conv                      [3, 32, 6, 2, 2]              
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  2    115712  models.common.C3                        [128, 128, 2]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  3    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1   1182720  models.common.C3                        [512, 512, 1]                 
  9                -1  1    656896  models.common.SPPF                      [512, 512, 5]                 
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1     18879  models.yolo.Detect                      [2, [[10, 13, 16, 30, 33, 23], [30, 61, 62, 45, 59, 119], [116, 90, 156, 198, 373, 326]], [128, 256, 512]]
Model summary: 270 layers, 7025023 parameters, 7025023 gradients, 16.0 GFLOPs

Transferred 343/349 items from yolov5s.pt
AMP: checks passed 
optimizer: SGD(lr=0.01) with parameter groups 57 weight(decay=0.0), 60 weight(decay=0.0005), 60 bias
train: Scanning 'D:\resource\develop\python\dataset.yolov5.6.2\test\data\labels.cache' images and labels... 3 found, 0 missing, 0 empty, 0 corrupt: 100%|██████████| 3/3 [00:00<?, ?it/s]
val: Scanning 'D:\resource\develop\python\dataset.yolov5.6.2\test\data\labels.cache' images and labels... 3 found, 0 missing, 0 empty, 0 corrupt: 100%|██████████| 3/3 [00:00<?, ?it/s]
Plotting labels to D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train\exp\labels.jpg... 

AutoAnchor: 4.94 anchors/target, 1.000 Best Possible Recall (BPR). Current anchors are a good fit to dataset 
Image sizes 640 train, 640 val
Using 3 dataloader workers
Logging results to D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train\exp
Starting training for 300 epochs...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
     0/299    0.621G    0.1239   0.05093    0.0282        24       640: 100%|██████████| 1/1 [00:02<00:00,  2.47s/it]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 1/1 [00:00<00:00,  6.26it/s]
                 all          3         16    0.00131     0.0625    0.00074    0.00037

     Epoch   gpu_mem       box       obj       cls    labels  img_size
     1/299    0.774G    0.1243   0.04956   0.02861        22       640: 100%|██████████| 1/1 [00:00<00:00,  5.85it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 1/1 [00:00<00:00,  6.07it/s]
                 all          3         16    0.00342      0.188    0.00261    0.00032

...
...
...

     Epoch   gpu_mem       box       obj       cls    labels  img_size
   298/299     0.83G   0.04905   0.04686  0.006326        26       640: 100%|██████████| 1/1 [00:00<00:00,  8.85it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 1/1 [00:00<00:00, 12.50it/s]
                 all          3         16      0.942      0.956      0.981      0.603

     Epoch   gpu_mem       box       obj       cls    labels  img_size
   299/299     0.83G   0.05362   0.05724   0.01014        39       640: 100%|██████████| 1/1 [00:00<00:00,  4.93it/s]
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 1/1 [00:00<00:00, 12.50it/s]
                 all          3         16      0.942      0.951      0.981      0.631

300 epochs completed in 0.054 hours.
Optimizer stripped from D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train\exp\weights\last.pt, 14.5MB
Optimizer stripped from D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train\exp\weights\best.pt, 14.5MB

Validating D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train\exp\weights\best.pt...
Fusing layers... 
Model summary: 213 layers, 7015519 parameters, 0 gradients, 15.8 GFLOPs
               Class     Images     Labels          P          R     mAP@.5 mAP@.5:.95: 100%|██████████| 1/1 [00:00<00:00,  9.34it/s]
                 all          3         16      0.942      0.952      0.981      0.624
                head          3          8      0.884       0.96      0.967      0.504
                body          3          8          1      0.945      0.995      0.743
Results saved to D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train\exp

测试训练结果

weights 里面的 best.pt 就是本次训练出来的模型了, 测试一下

拷贝项目中的 detect.py 为 detect.for.me.test.py, 修改部分参数

–weights: ‘D:\resource\develop\python\dataset.yolov5.6.2\test\runs\train\exp\weights\best.pt’

或者修改 inference.step.2.py 中的模型查看效果

还将就, 毕竟样本也就3张图片, 就是 head 的检测有点问题, 调整下 --iou-thres 为 0 (交并比大于此值的框会被留下) 试试, 哈哈, 模型觉得脑袋更像是个 body, 无所谓了, 本来也就是个 test, 能用就行了

训练 Apex 模型

Yolov5 5.0 环境 (失败, 但先留着)

yolov5-5.0 下载

用 conda 创建虚拟环境, 下载最新版 yolov5 源码, 解压到 pycharm workspace, 用 pycharm 打开, 选择创建的虚拟环境

# 创建虚拟环境, 建议创建在项目路径下
# Python 3.8 or later with all requirements.txt dependencies installed, including torch>=1.7. To install run:
conda create -p C:\mrathena\develop\workspace\pycharm\yolov5-5.0\venv python=3.8 # -n和-p不能同时设置 ...
# 激活虚拟环境. 我猜 -n 其实就是 -p 的特殊版本, 相当于指定了路径前缀 conda\envs, 两者其实是一样的
conda activate C:\mrathena\develop\workspace\pycharm\yolov5-5.0\venv
# cd 到项目路径, 执行安装依赖包命令
cd C:\mrathena\develop\workspace\pycharm\yolov5-5.0
# 安装依赖包
pip install -r requirements.txt

Building wheel for pycocotools (pyproject.toml) ... error
error: Microsoft Visual C++ 14.0 or greater is required. Get it with "Microsoft C++ Build Tools": https://visualstudio.microsoft.com/visual-cpp-build-tools/

安装的过程中报了个错, 据说是需要使用 VC++14编译工具编译 wheel 文件, 但现在没有安装这个工具

常规的解决方法肯定是安装这个工具, 因为还有其他包可能也存在这个情况. 但是因为对c/c++不熟悉, 百度的结果也是需要安装一大堆东西, 大约6-7G的样子, 可以说把c++开发桌面程序的开发环境都准备好了 … 就为了编译一下这个包 … 感觉太恶心了

好在有老哥提供了编译并安装好的 pycocotools 的拷贝, 我们直接下载解压拷贝到虚拟环境的 Lib/site-packages 中, 就算我们成功安装了

pycocotools 2.0.2 installed copy.rar

为了解决这个问题, 或许也可以使用下面办法

如果 python<=3.8, 可以试试下载下方别人编译好的二进制安装包 pycocotools-windows.whl, 通过 pip install xxx.whl 来安装

如果 python>3.8, 或许得安装c++开发环境/那个1.1G的离线安装文件了, 真恶心

清华源的 pycocotools-windows, 可惜没有3.8以上版本的
 非官方的Python扩展包的二进制存档

# 安装 pycocotools 报错后, 使用别人安装好的 pycocotools 替代. 下载解压拷贝到虚拟环境的 Lib/site-packages 中, 重新安装依赖包
pip install -r requirements.txt
# 安装成功

pycharm 中右下角选择 python 解释器, 使用该项目下的虚拟环境

运行 detect.py 测试效果, 报下列错误

AttributeError: Can't get attribute 'SPPF' on <module 'models.common' from 'C:\\mrathena\\develop\\workspace\\pycharm\\yolov5-5.0\\models\\common.py'>

解决方案: 到 6.0 的 /models/common.py 文件中, 找到 class SPPF 拷贝到当前版本的相同文件中, 然后引入 import warnings

什么玩意儿, 一堆问题, 恶心死了