Python绘制箱型图的技巧和方法

引言

本篇是之前有一个需求,需要用python来画箱型图,但要求很多,所以我也不断再版,今天突然想起来这个东西可以总结一下,正好马上得思考下一步做啥了,有足够的空闲时间,所以准备把一些基础概念再好好复习一遍。

箱型图原理

关于原理,这里推荐两篇本站写得比较好的:

Matplotlib – 箱线图、箱型图 boxplot () 所有用法详解

Python 箱型图的绘制并提取特征值

我这里也是根据这两篇作为参考,箱型图的介绍如第二篇中画的那张原理图一样:

而如果用python来绘制箱型图,具体的源码字段为:

# Autogenerated by boilerplate.py.  Do not edit as changes will be lost.
@_copy_docstring_and_deprecators(Axes.boxplot)
def boxplot(
        x, notch=None, sym=None, vert=None, whis=None,
        positions=None, widths=None, patch_artist=None,
        bootstrap=None, usermedians=None, conf_intervals=None,
        meanline=None, showmeans=None, showcaps=None, showbox=None,
        showfliers=None, boxprops=None, labels=None, flierprops=None,
        medianprops=None, meanprops=None, capprops=None,
        whiskerprops=None, manage_ticks=True, autorange=False,
        zorder=None, capwidths=None, *, data=None):
    return gca().boxplot(
        x, notch=notch, sym=sym, vert=vert, whis=whis,
        positions=positions, widths=widths, patch_artist=patch_artist,
        bootstrap=bootstrap, usermedians=usermedians,
        conf_intervals=conf_intervals, meanline=meanline,
        showmeans=showmeans, showcaps=showcaps, showbox=showbox,
        showfliers=showfliers, boxprops=boxprops, labels=labels,
        flierprops=flierprops, medianprops=medianprops,
        meanprops=meanprops, capprops=capprops,
        whiskerprops=whiskerprops, manage_ticks=manage_ticks,
        autorange=autorange, zorder=zorder, capwidths=capwidths,
        **({"data": data} if data is not None else {}))

(引用自:https://github.com/matplotlib/matplotlib/blob/v3.7.1/lib/matplotlib/pyplot.py#L2473-L2494)

而根据上述两篇中的解释,更改了一些介绍为:

参数 说明 参数 说明
x 指定要绘制箱线图的数据,可以是一组数据也可以是多组数据; showcaps 是否显示箱线图顶端和末端的两条线,默认显示;
notch 是否以凹口的形式展现箱线图,默认非凹口,即矩形 showbox 是否显示箱线图的箱体,默认显示;
sym 指定异常点的形状,默认为蓝色的+号显示; showfliers 是否显示异常值,默认显示;
vert 是否需要将箱线图垂直摆放,默认垂直摆放,False为水平; boxprops 设置箱体的属性,如边框色,填充色等;
whis 指定上下须与上下四分位的距离,默认为1.5倍的四分位差; labels 为箱线图添加标签,图例
positions 指定箱线图的位置,默认为range(1, N+1),N为箱线图的数量; filerprops 设置异常值的属性,如异常点的形状、大小、填充色等;
widths 指定箱线图的宽度,默认为0.5; medianprops 设置中位数的属性,如线的类型、粗细等;
patch_artist 是否填充箱体的颜色,默认为False; meanprops 设置均值的属性,如点的大小、颜色等;
meanline 是否用线的形式表示均值,默认用点来表示; capprops 设置箱线图顶端和末端线条的属性,如颜色、粗细等;
showmeans 是否显示均值,默认不显示; whiskerprops 设置须的属性,如颜色、粗细、线的类型等;
manage_ticks 是否自适应标签位置,默认为True; autorange 是否自动调整范围,默认为False;

那下面直接进入实战阶段。

箱型图的绘制

这里直接给出一个简版,因为我的点是从无人机视频流中的人提取出来的,所以就省略前面的细节,直接给出一个简版,首先是提取行人平均行动轨迹:

def throw_time(array,start_x,end_x,y):
    indexs = []
    index = 1
    person_throw_time = []
    for i in range(max(array[:,1])):
        if i == 0:
            continue
        each_person_data = array[array[:,1] == i]

        each_person_data = each_person_data[each_person_data[:,2]>start_x]
        each_person_data = each_person_data[each_person_data[:,2]<end_x]
        each_person_data = each_person_data[each_person_data[:,3]>y]
        if each_person_data.shape[0] < 4:
            continue
        each_person_data[:,2] = each_person_data[:,2] + (each_person_data[:,4] / 2)
        each_person_data[:,3] = each_person_data[:,3] + (each_person_data[:,5] / 2)
        person_time = (each_person_data[-1,0] - each_person_data[0,0])*0.04
        print("person time = ",person_time)
        if person_time < 5:
            continue
        person_throw_time.append(person_time)
        indexs.append(index)
        index = index + 1
    return indexs,person_throw_time

indexs1,person_throw_time1 = throw_time(array1,500,1400,400)      
# print(person_throw_time1)
# [10.36, 9.76, 9.48, 9.56, 6.16, 8.36, 8.6, 8.76, 5.6000000000000005, 9.84, 8.0, 9.88, 8.36, 9.16, 8.0, 8.92, 8.32, 9.68, 7.6000000000000005, 8.24, 7.08, 8.8, 8.6, 9.88, 9.64, 9.36, 10.16, 9.56, 7.4, 9.32, 8.48, 9.88, 9.16, 9.48, 9.64, 8.76]
indexs2,person_throw_time2 = throw_time(array2,500,1400,400)
indexs3,person_throw_time3 = throw_time(array3,450,1300,400)
indexs4,person_throw_time4 = throw_time(array4,600,1400,400)

然后就会得到一系列的散点以及它们的索引坐标,这时候再根据这个去画图:

    matplotlib.rc("font", family='Times New Roman')
    plt.ylabel('time(s)', fontsize=18)        
    
    # # 绘图
    ax = plt.subplot()
    ax.boxplot([person_throw_time1, person_throw_time2, person_throw_time3, person_throw_time4], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
    # 设置轴坐标值刻度的标签
    ax.set_xticklabels(['List 1', 'List 2', 'List 3', 'List 4'], fontsize=14)
    plt.show()

这里我选用的代码创建了一个包含四个框的箱线图,每个框包含来自 [person_throw_time1、person_throw_time2、person_throw_time3、person_throw_time4] 列表之一的数据。方框填充了天蓝色,并在它们周围绘制了黑色边缘,在每个框的平均值处绘制一条红线,以及不显示离群值。

或许大部分人都是做到这就满足需求了,我开始也以为是的,因为上述是基于第一版的一些偏差颜色以及图例错误后的第二版改进,但最终我做到了第6版,并且又重新更改了画图逻辑。

根据原理绘制箱型图

有没有一种情况,需求给出了另一组不知道从哪里得来的数据,希望我产生一个对比图,而它的数据是直接给出了箱型图的5个点,没有做过多的掩饰,我也没有一丝丝防备,就这样出现,直接丢给了我一张Excel表,我。。。然后就整理好了数据,将上述我的[person_throw_time1、person_throw_time2、person_throw_time3、person_throw_time4]转化成dataframe并使用describe找到其对应的5等分点,这里因为真实数据涉及一些安全问题,以简单的数字代替,即:

import pandas as pd

# 假设这是您的四个列表
person_throw_time1 = [1, 2, 3, 4, 5]
person_throw_time2 = [6, 7, 8, 9, 10]
person_throw_time3 = [11, 12, 13, 14, 15]
person_throw_time4 = [16, 17, 18, 19, 20]

# 将四个列表合并成一个dataframe
data = pd.DataFrame({'data1': data1, 'data2': data2, 'data3': data3, 'data4': data4})

# 使用describe方法计算统计信息
statistics = data.describe()

print(statistics)

那么可以得到相对应的数据:

           data1      data2      data3      data4
count   5.000000   5.000000   5.000000   5.000000
mean    3.000000   8.000000  13.000000  18.000000
std     1.581139   1.581139   1.581139   1.581139
min     1.000000   6.000000  11.000000  16.000000
25%     2.000000   7.000000  12.000000  17.000000
50%     3.000000   8.000000  13.000000  18.000000
75%     4.000000   9.000000  14.000000  19.000000
max     5.000000  10.000000  15.000000  20.000000

我这里重新整理了一下,三组实验结果放到一起为(PS:做了一些修改,所以非标准的五分位):

[
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]

但影响不大,这里针对上面数据重新画图为:

import matplotlib.pyplot as plt
import matplotlib

data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]

# 提取数据和标签
labels = [row[0] for row in data]
box_data = [row[1:] for row in data]

# 设置字体
matplotlib.rc("font", family='Times New Roman')

# 绘制箱型图
fig, ax = plt.subplots()
ax.boxplot(box_data, widths=0.4, patch_artist=True, showfliers=False,
           boxprops={'facecolor': 'skyblue', 'linewidth': 0.8, 'edgecolor': 'black'},
           meanline=True, meanprops={'color': 'red', 'linewidth': 3})

# 设置轴标签
ax.set_ylabel('time(s)', fontsize=18)
ax.set_xticklabels(labels, rotation=45, fontsize=12)

plt.show()

但画完之后还有个问题,就是有些箱型图的上下界限没有了,不知道是什么原因,所以这里还需要把这个重新调试出来,这里就需要用python画箱型图的另一种格式,即将上面的data转化成字典的格式:

data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]


def convert_to_dict(data):
    draw_data = []
    for row in data:
        draw_data.append({
            "whislo": row[1],
            "q1": row[2],
            "med": row[3],
            "q3": row[4],
            "whishi": row[5]
        })
    return draw_data

draw_data = convert_to_dict(data)
print(draw_data)
# [{'whislo': 6.1, 'q1': 9.15, 'med': 9.84, 'q3': 10.44, 'whishi': 11.16}, {'whislo': 7.0, 'q1': 9.47, 'med': 10.05, 'q3': 10.81, 'whishi': 12.02}, {'whislo': 14.16, 'q1': 18.41, 'med': 20.19, 'q3': 21.08, 'whishi': 25.42}, {'whislo': 6.54, 'q1': 8.65, 'med': 9.1, 'q3': 9.39, 'whishi': 10.08}, {'whislo': 7.31, 'q1': 9.1, 'med': 9.5, 'q3': 10.31, 'whishi': 10.86}, {'whislo': 10.32, 'q1': 14.18, 'med': 15.42, 'q3': 18.08, 'whishi': 20.72}, {'whislo': 6.14, 'q1': 8.1, 'med': 8.44, 'q3': 9.1, 'whishi': 9.82}, {'whislo': 6.22, 'q1': 8.3, 'med': 8.7, 'q3': 9.2, 'whishi': 10.12}, {'whislo': 8.72, 'q1': 10.61, 'med': 12.71, 'q3': 16.11, 'whishi': 17.91}, {'whislo': 7.1, 'q1': 8.75, 'med': 8.84, 'q3': 9.1, 'whishi': 10.96}, {'whislo': 7.3, 'q1': 8.85, 'med': 9.04, 'q3': 9.1, 'whishi': 11.19}, {'whislo': 7.6, 'q1': 8.3, 'med': 8.4, 'q3': 9.0, 'whishi': 12.55}]

这里拿到列表转化成的字典后,同时对 ax.boxplot() 变成 ax.bxp(),因为boxplot用于绘制单个箱线图,而bxp是多个,每个箱线图都可以由五个统计值(最小值、下四分位数、中位数、上四分位数和最大值)来描述。所以代码为:


import matplotlib.pyplot as plt
import matplotlib




data = [
    ["List 1", 6.1, 9.15, 9.84, 10.44, 11.16],
    ["2epochs List 1", 7.0, 9.47, 10.05, 10.81, 12.02],
    ["3epochs List 1", 14.16, 18.41, 20.19, 21.08, 25.42],
    ["List 2", 6.54, 8.65, 9.1, 9.39, 10.08],
    ["2epochs List 2", 7.31, 9.1, 9.5, 10.31, 10.86],
    ["3epochs List 2", 10.32, 14.18, 15.42, 18.08, 20.72],
    ["List 3", 6.14, 8.1, 8.44, 9.1, 9.82],
    ["2epochs List 3", 6.22, 8.3, 8.7, 9.2, 10.12],
    ["3epochs List 3", 8.72, 10.61, 12.71, 16.11, 17.91],
    ["List 4", 7.1, 8.75, 8.84, 9.1, 10.96],
    ["2epochs List 4", 7.3, 8.85, 9.04, 9.1, 11.19],
    ["3epochs List 4", 7.6, 8.3, 8.4, 9.0, 12.55]
]


def convert_to_dict(data):
    draw_data = []
    for row in data:
        draw_data.append({
            "whislo": row[1],
            "q1": row[2],
            "med": row[3],
            "q3": row[4],
            "whishi": row[5]
        })
    return draw_data

draw_data = convert_to_dict(data)

matplotlib.rc("font", family='Times New Roman')
plt.ylabel('time(s)', fontsize=18)

ax = plt.subplot()
# ax.boxplot([row1_data, row2_data, row3_data, row4_data, row5_data, row6_data, row7_data, row8_data], widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})


ax.bxp(draw_data, widths=0.4, patch_artist=True,showfliers=False,boxprops={'facecolor': 'skyblue', 'linewidth': 0.8,'edgecolor': 'black'},meanline=True,meanprops={'color': 'red', 'linewidth': 3})
# boxplot
# ax.bxp(draw_data, showfliers=False)

ax.set_xticklabels(['List 1', '2epochs List 1', '3epochs List 1', 'List 2', '2epochs List 2', '3epochs List 2', 'List 3', '2epochs List 3', '3epochs List 3', 'List 4', '2epochs List 4', '3epochs List 4'], fontsize=14)
plt.show()

物联沃分享整理
物联沃-IOTWORD物联网 » Python绘制箱型图的技巧和方法

发表评论