代码收藏家技术教程 2022-07-31

YOLOv5输入端（一）—— Mosaic数据增强|CSDN创作打卡

入门小菜鸟，希望像做笔记记录自己学的东西，也希望能帮助到同样入门的人，更希望大佬们帮忙纠错啦~侵权立删。

一、原理分析

二、代码分析

1、主体部分——load_mosaic

2、load_image函数

3、random_perspective()函数（详见代码解析）

一、原理分析

YOLOv5采用和YOLOv4一样的Mosaic数据增强。

主要原理：它将一张选定的图片和随机的3张图片进行随机裁剪，再拼接到一张图上作为训练数据。

这样可以丰富图片的背景，而且四张图片拼接在一起变相提高了batch_size，在进行batch normalization（归一化）的时候也会计算四张图片。

这样让YOLOv5对本身batch_size不是很依赖。

二、代码分析

1、主体部分——load_mosaic

    labels4, segments4 = [], []
    s = self.img_size #获取图像尺寸
    yc, xc = (int(random.uniform(-x, 2 * s + x)) for x in self.mosaic_border)  # mosaic center x, y
    #random.uniform随机生成上述范围的实数（即一半图像尺寸到1.5倍图像尺寸）
    #这里是随机生成mosaic中心点

先初始化标注列表为空，然后获取图像尺寸s

根据图像尺寸利用random.uniform()随机生成mosaic中心点，范围在（即一半图像尺寸到1.5倍图像尺寸）

    indices = [index] + random.choices(self.indices, k=3)  # 3 additional image indices
    #随机生成另外3张图片的索引
    #random.choices——随机生成3个图片总数内的索引
    #然后一起把索引，连同原先选定的图片打包进indices
    random.shuffle(indices)
    #对这些索引值随机排序

利用random.choices()随机生成另外3张图片的索引，将这4张图片的索引填进indices列表，然后利用random.shuffle()对这些索引值随机排序

for i, index in enumerate(indices): #循环遍历这些图片
        # Load image
        img, _, (h, w) = load_image(self, index)#加载图片和高宽

循环遍历这4张图片，并且调用load_image()函数加载图片和对应高宽

接下来就是如何放置这4张图啦~

        # place img in img4
        if i == 0:  # top left（左上角）
            img4 = np.full((s * 2, s * 2, img.shape[2]), 114, dtype=np.uint8)  # base image with 4 tiles
            #先生成背景图
            x1a, y1a, x2a, y2a = max(xc - w, 0), max(yc - h, 0), xc, yc  # xmin, ymin, xmax, ymax (large image)
            #设置大图上的位置（要么原图大小，要么放大）（w，h）或（xc，yc）（新生成的那张大图）
            x1b, y1b, x2b, y2b = w - (x2a - x1a), h - (y2a - y1a), w, h  # xmin, ymin, xmax, ymax (small image)
            #选取小图上的位置（原图）

第一张图片放在左上角

img4首先用np.full()函数填充初始化大图，尺寸是4张图那么大

然后分别设置大图上该图片的位置，以及相应的在原图（即小图）上截取的位置坐标

        elif i == 1:  # top right（右上角）
            x1a, y1a, x2a, y2a = xc, max(yc - h, 0), min(xc + w, s * 2), yc
            x1b, y1b, x2b, y2b = 0, h - (y2a - y1a), min(w, x2a - x1a), h
        elif i == 2:  # bottom left（左下角）
            x1a, y1a, x2a, y2a = max(xc - w, 0), yc, xc, min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = w - (x2a - x1a), 0, w, min(y2a - y1a, h)
        elif i == 3:  # bottom right（右下角）
            x1a, y1a, x2a, y2a = xc, yc, min(xc + w, s * 2), min(s * 2, yc + h)
            x1b, y1b, x2b, y2b = 0, 0, min(w, x2a - x1a), min(y2a - y1a, h)

剩下3张如法炮制

        img4[y1a:y2a, x1a:x2a] = img[y1b:y2b, x1b:x2b]  # img4[ymin:ymax, xmin:xmax]
        #大图上贴上对应的小图

大图上贴上小图的对应部分

        padw = x1a - x1b
        padh = y1a - y1b
        #计算小图到大图上时所产生的偏移，用来计算mosaic增强后的标签的位置

计算小图到大图上时所产生的偏移，用来计算mosaic增强后的标签的位置

        # Labels
        labels, segments = self.labels[index].copy(), self.segments[index].copy()
        #获取标签
        if labels.size:
            labels[:, 1:] = xywhn2xyxy(labels[:, 1:], w, h, padw, padh)  # normalized xywh to pixel xyxy format
            #将xywh（百分比那些值）标准化为像素xy格式
            segments = [xyn2xy(x, w, h, padw, padh) for x in segments]
            #转为像素段
        labels4.append(labels)
        segments4.extend(segments)
        #填进列表

对label标注进行初始化操作：

先读取对应图片的label，然后将xywh格式的label标准化为像素xy格式的。

segments转为像素段格式

然后统统填进之前准备的标注列表

    # Concat/clip labels
    labels4 = np.concatenate(labels4, 0) #完成数组拼接
    for x in (labels4[:, 1:], *segments4):
        np.clip(x, 0, 2 * s, out=x)  # clip when using random_perspective()
        #np.clip截取函数，固定值在0到2s内
    # img4, labels4 = replicate(img4, labels4)  # replicate

先把label列表进行数组拼接，转化好格式，方便下面的处理，并且把数据截取在0到2倍图片尺寸

    # Augment
    #进行mosaic的时候将四张图片整合到一起之后shape为[2*img_size,2*img_size]
    #对mosaic整合的图片进行随机旋转、平移、缩放、裁剪，并resize为输入大小img_size
    img4, labels4, segments4 = copy_paste(img4, labels4, segments4, p=self.hyp['copy_paste'])
    img4, labels4 = random_perspective(img4, labels4, segments4,
                                       degrees=self.hyp['degrees'],
                                       translate=self.hyp['translate'],
                                       scale=self.hyp['scale'],
                                       shear=self.hyp['shear'],
                                       perspective=self.hyp['perspective'],
                                       border=self.mosaic_border)  # border to remove

进行mosaic的时候将四张图片整合到一起之后shape为[2*img_size,2*img_size]

并且对mosaic整合的图片进行随机旋转、平移、缩放、裁剪，并resize为输入大小img_size

    return img4, labels4

最后返回处理好的图片和相应的label

2、load_image函数

load_image函数：加载图片并根据设定的输入大小与图片原大小的比例ratio进行resize

首先获取该索引的图片

def load_image(self, i):
    #load_image加载图片并根据设定的输入大小与图片原大小的比例ratio进行resize
    # loads 1 image from dataset index 'i', returns im, original hw, resized hw
    im = self.imgs[i]#获取该索引的图片

判断一下图片是否有缓存，即有没有缩放处理过（这里不太确定这样理解对不对，如果错了麻烦在评论区跟我说一下下，谢谢啦~）

🎈如果没有：

先去对应文件夹中找

🌳如果能找到：加载这张图片

🌳如果找不到：读取这张图的路径，然后报错找不到对应路径的这张图片

读取这张图的原始高宽以及设定resize比例

如果这个比例不等于1，那我们就resize一下进行一个缩放

最后返回这张图片，原始高宽和缩放后的高宽

    if im is None:  # not cached in ram
        #图片如果没有缓存（就是还没有任何缩放处理过）
        npy = self.img_npy[i] #去文件夹中找
        if npy and npy.exists():  # load npy
            im = np.load(npy) #找到了我们就加载这张图片
        else:  # read image
            path = self.img_files[i] #找不到图片就读取原本这张图的路径
            im = cv2.imread(path)  # BGR
            assert im is not None, f'Image Not Found {path}' #报错找不到这张图
        h0, w0 = im.shape[:2]  # orig hw
        #读取这张图的原始高宽
        r = self.img_size / max(h0, w0)  # ratio 
        #设定resize比例
        if r != 1:  # if sizes are not equal
            im = cv2.resize(im, (int(w0 * r), int(h0 * r)),
                            interpolation=cv2.INTER_AREA if r < 1 and not self.augment else cv2.INTER_LINEAR)#实现缩放
        return im, (h0, w0), im.shape[:2]  # im, hw_original, hw_resized

🎈如果有

那就直接返回这张图片，原始高宽和缩放后的高宽啦~

    else:
        return self.imgs[i], self.img_hw0[i], self.img_hw[i]  # im, hw_original, hw_resized

3、random_perspective()函数（详见代码解析）

随机变换

计算方法：坐标向量和变换矩阵的乘积

首先获得加上边框后的图片高宽

def random_perspective(im, targets=(), segments=(), degrees=10, translate=.1, scale=.1, shear=10, perspective=0.0,
                       border=(0, 0)):
    # torchvision.transforms.RandomAffine(degrees=(-10, 10), translate=(0.1, 0.1), scale=(0.9, 1.1), shear=(-10, 10))
    # targets = [cls, xyxy]

    #图片高宽（加上border边框）
    height = im.shape[0] + border[0] * 2  # shape(h,w,c)
    width = im.shape[1] + border[1] * 2

然后计算中心点

    # Center
    C = np.eye(3)#生成3*3的对角为1的对角矩阵
    #x方向的中心
    C[0, 2] = -im.shape[1] / 2  # x translation (pixels)
    #y方向的中心
    C[1, 2] = -im.shape[0] / 2  # y translation (pixels)

接下来是各种变换（旋转等等）的矩阵准备

    # Perspective
    #透视
    P = np.eye(3)#生成3*3的对角为1的对角矩阵
    #随机生成x，y方向上的透视值
    P[2, 0] = random.uniform(-perspective, perspective)  # x perspective (about y)
    P[2, 1] = random.uniform(-perspective, perspective)  # y perspective (about x)

    # Rotation and Scale
    #旋转和缩放
    R = np.eye(3)#生成3*3的对角为1的对角矩阵
    a = random.uniform(-degrees, degrees)#随机生成范围内的角度
    # a += random.choice([-180, -90, 0, 90])  # add 90deg rotations to small rotations
    s = random.uniform(1 - scale, 1 + scale) #随机生成缩放比例
    # s = 2 ** random.uniform(-scale, scale)
    R[:2] = cv2.getRotationMatrix2D(angle=a, center=(0, 0), scale=s)#图片旋转得到仿射变化矩阵赋给R的前两行

    # Shear
    #弯曲角度
    S = np.eye(3)#生成3*3的对角为1的对角矩阵
    S[0, 1] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # x shear (deg)
    S[1, 0] = math.tan(random.uniform(-shear, shear) * math.pi / 180)  # y shear (deg)

    # Translation
    #转换（放大缩小？)
    T = np.eye(3)
    T[0, 2] = random.uniform(0.5 - translate, 0.5 + translate) * width  # x translation (pixels)
    T[1, 2] = random.uniform(0.5 - translate, 0.5 + translate) * height  # y translation (pixels)

然后是组合旋转矩阵

    # Combined rotation matrix
    #组合旋转矩阵
    M = T @ S @ R @ P @ C  # order of operations (right to left) is IMPORTANT
    #通过矩阵乘法组合
    if (border[0] != 0) or (border[1] != 0) or (M != np.eye(3)).any():  # image changed
        #没有边框或者没有任何变换
        if perspective:#如果透视
            im = cv2.warpPerspective(im, M, dsize=(width, height), borderValue=(114, 114, 114))
            #cv2.warpPerspective透视变换函数，可保持直线不变形，但是平行线可能不再平行
        else:  # affine
            im = cv2.warpAffine(im, M[:2], dsize=(width, height), borderValue=(114, 114, 114))
            #cv2.warpAffine放射变换函数，可实现旋转，平移，缩放，并且变换后的平行线依旧平行

然后是变换标签的坐标

    # Transform label coordinates
    #变换标签坐标
    n = len(targets)#目标个数
    if n:#如果有目标
        use_segments = any(x.any() for x in segments)#判断segments是否为空或是否全为0（目标像素段）
        new = np.zeros((n, 4))#初始化信息矩阵，每个目标4个信息xywh
        if use_segments:  # warp segments（变形segments）
            #如果不是空的
            segments = resample_segments(segments)  # upsample
            #上采样
            for i, segment in enumerate(segments):
                xy = np.ones((len(segment), 3))
                xy[:, :2] = segment#前两列是目标中心的像素段
                xy = xy @ M.T  # transform转化
                xy = xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]  # perspective rescale or affine
                #透视处理，重新缩放或者仿射
                #xy的最后一列全为1是为了和M.T矩阵相乘时，只会与最后M.T的最后一行相乘，而M.T的最后一行则为P当时设定的透视值

                # clip修建
                new[i] = segment2box(xy, width, height)

        else:  # warp boxes（变形box）
            xy = np.ones((n * 4, 3))
            xy[:, :2] = targets[:, [1, 2, 3, 4, 1, 4, 3, 2]].reshape(n * 4, 2)  # x1y1, x2y2, x1y2, x2y1
            xy = xy @ M.T  # transform
            xy = (xy[:, :2] / xy[:, 2:3] if perspective else xy[:, :2]).reshape(n, 8)  # perspective rescale or affine

            # create new boxes
            x = xy[:, [0, 2, 4, 6]]
            y = xy[:, [1, 3, 5, 7]]
            new = np.concatenate((x.min(1), y.min(1), x.max(1), y.max(1))).reshape(4, n).T

            # clip
            #去除进行上面一系列操作后被裁剪过小的框
            new[:, [0, 2]] = new[:, [0, 2]].clip(0, width)
            new[:, [1, 3]] = new[:, [1, 3]].clip(0, height)

最后是计算候选框并返回

        # filter candidates
        i = box_candidates(box1=targets[:, 1:5].T * s, box2=new.T, area_thr=0.01 if use_segments else 0.10)#计算候选框
        targets = targets[i]
        targets[:, 1:5] = new[i]

    return im, targets

欢迎大家在评论区批评指正，谢谢~

来源：tt丫