代码收藏家技术教程 2024-10-04

Python教程：将NWPU_VHR-10遥感目标检测数据集转换为YOLO格式的步骤与技巧

作者：CSDN @ _养乐多_

本文将介绍将 NWPU_VHR-10 遥感目标检测数据集转换成 YOLO 格式的 python 脚本。

文章目录

一、数据集介绍

1.1 数据集下载

1.2 数据集介绍

1.3 数据格式

二、格式转换

三、完整代码

四、划分数据集

一、数据集介绍

1.1 数据集下载

https://opendatalab.com/OpenDataLab/NWPU_VHR-10

1.2 数据集介绍

NWPU VHR-10数据集是具有挑战性的十类地理空间对象检测数据集。该数据集总共包含800幅VHR光学遥感图像，其中715幅彩色图像是从谷歌地球获得的，空间分辨率从0.5到2米，从具有0.08 m的空间分辨率的Vaihingen数据获取85幅经锐化的彩色红外图像。

该数据集被分成两组:

a) 在图像中包含至少一个目标的正图像集包含650图像。

b) 负图像集包含150图像，并且它不包含任何目标。
由此，正面图像集、757飞机、302船只、655储罐、390棒球钻石、524网球场、159篮球场、163地面田径场、224港口、124桥梁和477车辆被手动注释为边界框和用于地面真相的实例面具。

使用全部数据集或者部分数据的时候，需要引用以下论文：

Gong Cheng, Junwei Han, Peicheng Zhou, Lei Guo. Multi-class geospatial object detection and geographic image classification based on collection of part detectors. ISPRS Journal of Photogrammetry and Remote Sensing, 98: 119-132, 2014.

Gong Cheng, Junwei Han. A survey on object detection in optical remote sensing images. ISPRS Journal of Photogrammetry and Remote Sensing, 117: 11-28, 2016.

Gong Cheng, Peicheng Zhou, Junwei Han. Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images. IEEE Transactions on Geoscience and Remote Sensing, 54(12): 7405-7415, 2016.

1.3 数据格式

文本文件的每一行定义了一个对象边界框，格式如下：

(x1,y1),(x2,y2),a

其中(x1,y1)表示边界框的左上角坐标，(x2,y2)表示边界框的右下角坐标，而a是对象类别（1-飞机，2-船只，3-储罐，4-棒球场，5-网球场，6-篮球场，7-田径场地，8-港口，9-桥梁，10-车辆）。

二、格式转换

我们需要将 NWPU_VHR-10 数据集转换成 YOLO 格式，以进行目标检测。

就是将

(x1,y1),(x2,y2),a

转换为 YOLO 格式

class, x_center, y_center, width, height

YOLO的标注格式非常简洁，对于每张图片中的每一个对象，都会有一行描述该对象的信息。每一行包含五个值：

类别的索引（class index）：表示该对象所属的类别，在类别列表中的索引（从0开始）。

中心点的 x 坐标（x_center）：对象边界框中心点相对于图像宽度的归一化位置（范围 0 到 1）。

中心点的 y 坐标（y_center）：对象边界框中心点相对于图像高度的归一化位置（范围 0 到 1）。

边界框的宽度（width）：边界框的宽度相对于图像宽度的归一化值（范围 0 到 1）。

边界框的高度（height）：边界框的高度相对于图像高度的归一化值（范围 0 到 1）。

举个例子，如果有一个图像，其中包含一个类别为“人”的对象，该对象的边界框中心点位于图像的 (0.4, 0.5)，边界框的宽度为图像宽度的 0.1 倍，高度为图像高度的 0.2 倍，并且“人”这个类别的索引是 0，则对应的 YOLO 格式的标注文件中的一行可能如下所示：

0 0.4 0.5 0.1 0.2

三、完整代码

使用时，需要修改 class_map 和文件路径。

import os
from PIL import Image


def convert_to_yolo_format(x1, y1, x2, y2, img_width, img_height):
    dw = 1.0 / img_width
    dh = 1.0 / img_height
    center_x = (x1 + x2) / 2.0
    center_y = (y1 + y2) / 2.0
    width = x2 - x1
    height = y2 - y1
    center_x *= dw
    center_y *= dh
    width *= dw
    height *= dh
    return (center_x, center_y, width, height)


# 文件路径
annotations_dir = r'E:\DataSet\NWPUVHR10dataset\annotations'
images_dir = r'E:\DataSet\NWPUVHR10dataset\positive'
output_dir = r'E:\DataSet\NWPUVHR10dataset\yolo_labels'

# 确保输出目录存在
os.makedirs(output_dir, exist_ok=True)

# 类别映射
class_map = {
    '1': '0',  # 飞机
    '2': '1',  # 船只
    '3': '2',  # 储油罐
    '4': '3',  # 棒球场
    '5': '4',  # 网球场
    '6': '5',  # 篮球场
    '7': '6',  # 跑道场地
    '8': '7',  # 港口
    '9': '8',  # 桥梁
    '10': '9'  # 车辆
}


def get_image_size(image_path):
    # 打开图片文件
    with Image.open(image_path) as img:
        # 获取图片的宽度和高度
        width, height = img.size
        return width, height


def parse_bbox(line):
    try:
        # 去掉括号，并用逗号分割
        line = line.strip().replace('(', '').replace(')', '')
        parts = line.split(',')
        if len(parts) != 5:
            raise ValueError("Invalid line format")

        x1, y1, x2, y2 = map(int, parts[:4])
        class_id = parts[4]

        return x1, y1, x2, y2, class_id
    except ValueError as e:
        print(f"Error parsing line: {line}. Error: {e}")
        return None, None, None, None, None


# 遍历标注文件
for annotation_file in os.listdir(annotations_dir):
    if annotation_file.endswith('.txt'):
        # 读取对应的图像文件
        image_file = os.path.splitext(annotation_file)[0] + '.jpg'
        image_path = os.path.join(images_dir, image_file)

        # 打印调试信息
        print(f"Processing image: {image_path}")

        if not os.path.exists(image_path):
            print(f"Image file {image_path} does not exist, skipping.")
            continue

        # 获取图像大小
        img_width, img_height = get_image_size(image_path)

        # 读取标注文件
        annotation_path = os.path.join(annotations_dir, annotation_file)
        with open(annotation_path, 'r') as f:
            lines = f.readlines()

        # 输出文件路径
        output_path = os.path.join(output_dir, annotation_file)

        with open(output_path, 'w') as out_f:
            for line in lines:
                # 解析标注信息
                x1, y1, x2, y2, class_id = parse_bbox(line)
                if x1 is None:
                    continue

                class_id = class_map.get(class_id, None)
                if class_id is None:
                    print(f"Unknown class id: {class_id}")
                    continue

                # 转换为YOLO格式
                center_x, center_y, width, height = convert_to_yolo_format(x1, y1, x2, y2, img_width, img_height)

                # 写入输出文件
                out_f.write(f"{class_id} {center_x} {center_y} {width} {height}\n")

print("Conversion completed.")