代码收藏家技术教程 2022-08-05

Python破解滑动验证码（极验/无背景图）

在使用Python突破人机验证时，验证码乃第一大关卡。本文针对破解滑动验证码展开分析。对于能够直接获取滑块小图与背景图的滑动验证码，通过使用cv2模块的matchTemplate函数，可以准确地计算出缺口位置。但是，在一些网站的滑动验证码中，已将滑块小图与背景图进行加密隐藏，无法直接获取。于是，本文主要针此类滑动验证码进行分析。出于网络安全考虑，本文不展示全部代码，仅截取部分代码进行思路分享。

matchTemplate函数介绍：在整个图像区域发现与给定子图像匹配的小块区域。

在开始之前，先看下效果

如下图，识别的缺口已用红色方框绘制，自动识别准确率达99.5%。

那么，python是如何排除干扰元素进行缺口识别的？

1、先裁剪出滑动验证码大图

def save_element_image(self):
    """
    对指定元素进行截图、裁剪，返回保存路径
    self.page_image_file : 页面截图保存路径
    :return: self.element_image_file（元素截图保存路径）
    """
    # 开始截取整个网页，保存
    self.browser.save_screenshot(self.page_image_file)
    # 获得元素（x/y/width/height）参数
    left = self.element.location['x']
    top = self.element.location['y']
    element_width = left + self.element.size['width']
    element_height = top + self.element.size['height']
    picture = Image.open(self.page_image_file)
    # 从网页截图中，裁剪element元素部分
    picture = picture.crop((left, top, element_width, element_height))
    # 保存元素图片
    picture.save(self.element_image_file)
    # 返回截取元素的图片路径
    return self.element_image_file

2、保存滑动验证码大图如下

3、确定“滑块图”与“大图”的长宽

关于滑块图与大图的长宽数据，可从以上保存的截图中，取值得出。
无论验证码如何刷新，滑块的起始x坐标值、长度与宽度均是固定不变的。

class SlideVerificationCode:
    def __init__(self, browser, element):
        # browser : selenium浏览器
        self.browser = browser
        # element = browser.find_element(By.XPATH,'/html/body/div[4]/div[2]')
        self.element = element
        # 衡量已知：大图宽度
        self.big_image_width = 260
        # 衡量已知：大图高度
        self.big_image_height = 160
        # 衡量已知：滑块/缺口宽度
        self.small_image_width = 50
        # 衡量已知：滑块/缺口宽度
        self.small_image_height = 48
        # 衡量已知：滑块起始x坐标总是5像素
        self.small_image_x = 6
        # 滑块/缺口y坐标值未知
        self.small_image_y = 0
        # 图片保存路径
        self.image_save_path = "image"
        self.failed_path = os.path.join(self.image_save_path, "detection_failed")
        # 是否开启展示图片函数，默认为False
        self.switch_show_image = False
        if not os.path.exists(self.image_save_path):
            os.makedirs(self.image_save_path)
        if not os.path.exists(self.failed_path):
            os.makedirs(self.failed_path)
        # 获取时间作为文件名
        time_stamp = datetime.datetime.now()
        time_for_filename = time_stamp.strftime('%y-%m-%d_%H%M%S')
        # 定义裁剪的粉色区域与蓝色区域文件名
        self.big_image_file = os.path.join(self.image_save_path, '1_big_image_cut.png')
        self.small_image_file = os.path.join(self.image_save_path, '1_small_image_cut.png')
        # 定义页面截图与元素截图文件名
        self.page_image_basename = 'page_image' + time_for_filename+'.png'
        self.page_image_file = os.path.join(self.image_save_path, self.page_image_basename)
        self.element_image_basename = 'element_image' + time_for_filename+'.png'
        self.element_image_file = os.path.join(self.image_save_path, self.element_image_basename)

4、关于获取滑块的X与Y坐标

如果能够准确定位滑块的位置，就能通过使用模板匹配（matchTemplate）函数，从而计算出最佳匹配的缺口的位置。

那么，在存在干扰元素的情况下，如何确定【滑块小图】的位置？

5、思路

因滑块的x坐标值、y坐标值、宽度与高度这四种数据中，仅有y坐标值是未知的。如果能够计算得出y坐标值，那么就可以确定滑块的位置（即下图中粉色区域）。
如果滑块的y坐标值能够确定，那么缺口的y坐标值也就确定下来了。因缺口的x坐标值是未知的，可以横向绘制，画出缺口可能出现的区域（即下图中蓝色区域）。
通过获取这两个区域，再使用模板匹配（matchTemplate）函数进行计算最佳匹配位置，再从中取得缺口位置的x坐标值，即为滑块移动到缺口的距离。

6、那么，y坐标值如何获取？

在开始获取之前，先添加展示图片的函数（仅用于展示图片，正式使用时可剔除）

# 展示图片函数1
def cv_show_image1(self, img, show_title):
    if self.switch_show_image:
        cv2.imshow(show_title, img)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

# 展示图片函数2
def cv_show_image2(self, img1, show_title1, img2, show_title2):
    if self.switch_show_image:
        cv2.imshow(show_title1, img1)
        cv2.imshow(show_title2, img2)
        cv2.waitKey(0)
        cv2.destroyAllWindows()

7、边缘检测+轮廓检测函数

本函数将绘制出图片中所有轮廓。由于干扰元素的存在，在绘制出滑块轮廓与缺口轮廓的同时，还会出现干扰元素轮廓。

def canny_rect(self, image, show_image, file_string, time_now, scope_num=10, count=0):
    """
    边缘检测+轮廓检测
    :param image: 传入要进行检测的图片
    :param show_image: 用于画线展示
    :param file_string: 作为文件名字符串
    :param time_now: 时间作为文件名字符串
    :param scope_num: 轮廓检测范围参数
    :param count: 计算递归次数
    :return: y_list_tmp
    """
    y_list_tmp = []
    # 边缘检测（20和80分别为两个阈值）
    canny_rect = cv2.Canny(image, 20, 80)
    # 轮廓检测（返回所有识别的轮廓矩形）
    counts, _ = cv2.findContours(canny_rect, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    for c in counts:
        # x/y是矩阵左上点的坐标，w/h是矩阵的宽和高
        x, y, w, h = cv2.boundingRect(c)
        # 基于已知小图高度和宽度，去除不符合的高&宽
        if w >= self.small_image_width + scope_num or w <= self.small_image_width - scope_num:
            continue
        if h >= self.small_image_height + scope_num or h <= self.small_image_height - scope_num:
            continue
        # 记录所有匹配的y坐标值
        y_list_tmp.append(y)
        # 展示：对识别出的矩形，绘黑色线
        cv2.rectangle(show_image, (x, y), (x + w, y + h), (0, 0, 0), 1)
    # count开始计数，即上一次无法识别轮廓，保存本次识别轮廓图片，以备统计
    if count != 0:
        f_basename = self.element_image_basename.split('.')[0] + '_' + file_string
        check_ele_image = os.path.join(self.failed_path, f_basename + '.png')
        self.cv_show_image1(show_image, f'canny_{file_string}_{count}')
        # 保存本次轮廓检测结果
        cv2.imwrite(check_ele_image, show_image)
    # 如果递归至0都无法识别轮廓，为防止无限递归。此处设置终止条件，以及返回值。
    if self.small_image_width <= 0:
        print(f'    *无法识别轮廓返回大图图片大小中间值')
        return [self.big_image_width // 2]
    # 本次识别没有符合条件的y值。递归，以扩展轮廓识别范围。
    if not y_list_tmp:
        scope_num += 2
        count += 1
        print(f"    *{count}. 无法检测到轮廓!增加检测大小{str(scope_num)}")
        return self.canny_rect(image, show_image, file_string, time_now, scope_num, count)
    else:
        return y_list_tmp

8、轮廓绘制结果

下图P1为滑块可能出现的区域（经裁剪得到）。然后在其中，对识别的轮廓进行绘线标记（得到黑色框，并记录所有y值）。
下图P2为大图，在其中对识别的轮廓进行绘线标记（得到黑色框，并记录所有y值）。其中包括有：滑块绘制框、缺口绘制框、干扰元素绘制框。
通过对所有绘制框提取y坐标值，合并到y_list列表中，然后取得列表中的中位数（即是我们需要的y值）。

9、绘制粉色区域、蓝色区域

得到y坐标值后，可以裁剪出粉色区域与蓝色区域。

def cropped_image(self, img_path):
    """
    分析图片：裁剪出滑动小块的图片，与需要匹配的目标大图
    """
    # 读取图片, 整个滑动验证码的截图
    img_imread = cv2.imread(img_path)
    # 大图：最后展示用
    img_draw_last = img_imread.copy()
    # 大图：裁剪&保存用
    img_copy_for_save1 = img_imread.copy()
    # 小图：裁剪&保存用
    img_copy_for_save2 = img_imread.copy()
    # 大图：灰度处理
    img_gray = cv2.cvtColor(img_imread, cv2.COLOR_BGR2GRAY)
    # 大图：高斯模糊
    img_blur = cv2.GaussianBlur(img_gray, (5, 5), 0)
    # 裁剪出小图可能出现的区域(上下左右收缩5像素，按已知宽度self.small_image_width裁剪)
    img_small_blur = img_blur[5:self.big_image_height - 5,
                     self.small_image_x - 5:self.small_image_width + self.small_image_x + 8]
    # 对小图可能出现的区域进行边缘检测，返回得到的y值，存放在y_list
    y_list_slide = self.canny_rect(img_small_blur, img_draw_last, 'small', time.time())
    # 对整个大图进行边缘检测，返回得到的所有y值，存放在y_list
    y_list_big = self.canny_rect(img_blur, img_draw_last, 'big', time.time())
    y_list = y_list_slide + y_list_big
    '''在以上y_list数据中，
    即使包含干扰矩形的y值，
    也至少包含一个滑动小块矩形的y值，
    同时也至少包含一个缺口矩形的y值，
    由于滑动小块矩形与缺口矩形的y值都是一样的，
    并且干扰矩形，总是出现在缺口矩形的上方或下方，
    只需，取所有y的中间值，即是缺口矩形的y值，也是滑动小块的y值'''
    # 排序
    y_list.sort()
    # 取中位数
    median_y = int(np.median(y_list))
    self.small_image_y = median_y
    print(f'干扰矩形的y值总分布在两端):{str(y_list)}, 所以y_list的中位数为正确值:{str(median_y)}')
    # 得到y_list的中位数：median_y，即可以开始裁剪小图与大图
    '''大图裁剪(即蓝色框部分)：
          x0 = self.small_image_x + self.small_image_width + 2 (已知宽度 + 滑动小块的起始像素 + 容错像素2)
          x1 = self.big_image_width - 10 (已知大图长度 + - 容错像素10)
          y0 = median_y (y_list中位数)
          y1 = median_y + self.small_image_height (y_list中位数 + 已知小图高度)
    '''
    # 计算裁剪的大图四个坐标值
    big_x0, big_x1 = self.small_image_x + self.small_image_width, self.big_image_width - 15
    big_y0, big_y1 = median_y, median_y + self.small_image_height
    # 裁剪坐标为[y0:y1, x0:x1]
    gap_possible_areas = img_copy_for_save1[big_y0:big_y1, big_x0:big_x1]
    # 裁剪后，保存大图(缺口可能出现的区域)
    cv2.imwrite(self.big_image_file, gap_possible_areas)
    '''小图裁剪，(即粉色框部分)：
          x0 = self.small_image_x (已知其实位置总是self.small_image_x=8)
          x1 = 已知其实x值self.small_image_x + 已知宽度self.small_image_width
          y0 = median_y (y_list中位数)
          y1 = median_y + self.small_image_height (y_list中位数 + 已知小图高度)
    '''
    small_x0, small_x1 = self.small_image_x, self.small_image_x + self.small_image_width
    small_y0, small_y1 = median_y, median_y + self.small_image_height - 2
    # 裁剪：裁剪坐标为[y0:y1, x0:x1]
    slider_possible_areas = img_copy_for_save2[small_y0:small_y1, small_x0:small_x1]
    # 裁剪后，保存小图(滑动小块可能出现的区域)
    cv2.imwrite(self.small_image_file, slider_possible_areas)
    """
    最后展示
    """
    # 画出粉色区域(参数：长方形框左上角坐标, 长方形框右下角坐标)
    cv2.rectangle(img_draw_last, (small_x0, small_y0), (small_x1, small_y1), (255, 155, 255), 2)
    # 画出蓝色区域(参数：长方形框左上角坐标, 长方形框右下角坐标)
    cv2.rectangle(img_draw_last, (big_x0, big_y0), (big_x1, big_y1), (220, 20, 60), 2)
    self.cv_show_image1(img_draw_last, 'img_draw_last')

10、区域绘制结果

粉色区域：见下图即滑块具体位置，裁剪并保存为self.small_image_file。
蓝色区域：见下图即缺口可能出现的区域，裁剪并保存为self.big_image_file。
得到这两部分区域后，再使用模板匹配（matchTemplate）函数处理。

11、matchTemplate函数处理得到x坐标值

对上述保存的粉色区域与蓝色区域进行计算，得出最佳的匹配位置，即是缺口位置。从而取得缺口的x坐标值。

def match_template(self):
    """
    此处使用模板匹配方法（cv2.matchTemplate）：在整个图像区域发现与给定子图像匹配的小块区域
    self.big_image_file：大图
    self.small_image_file：小图，即子图
    :return: 匹配的区域的x坐标
    """
    # 加载大图
    big_image_rgb = cv2.imread(self.big_image_file)
    # 大图处理：灰度
    big_image_gray = cv2.cvtColor(big_image_rgb, cv2.COLOR_RGB2GRAY)
    # 加载小图
    small_image_rgb = cv2.imread(self.small_image_file)
    # 小图处理：灰度
    small_image_gray = cv2.imread(self.small_image_file, 0)
    # matchTemplate模板匹配：在整个图像区域发现与给定子图像匹配的小块区域
    res = cv2.matchTemplate(big_image_gray, small_image_gray, cv2.TM_CCOEFF_NORMED)
    value = cv2.minMaxLoc(res)
    # 匹配的x坐标值
    match_template_x = value[2][0]
    print("匹配的x坐标值", match_template_x)
    # 画出矩形框，展示
    cv2.rectangle(big_image_rgb,
                  (match_template_x, self.small_image_width),
                  (match_template_x + self.small_image_height, 0),
                  (0, 0, 205), 3)
    self.cv_show_image2(small_image_rgb, "small_image", big_image_rgb, "big_image")
    # 匹配的x坐标值，加上被裁剪掉的self.small_image_width数值，才是滑块需要移动的距离
    return match_template_x + self.small_image_width

12、最后

以上match_template函数的返回值，正是滑块移动到缺口需要的距离。获取该值，即可通过python控制滑块的移动，在缺口处准确释放（需模拟人为移动轨迹），从而成功破解滑动验证码，迈出突破人机验证的第一步。

13、补充

另外，下方补充导入的模块，以及最终绘图验证的函数。本文仅分享到这里，后续操作，读者可从其他文章获得，谢谢。

# 导入的模块
import datetime
import os
import random
import time
from time import sleep
from PIL import Image
import cv2
from selenium.webdriver import ActionChains
import numpy as np

# 绘制验证函数
def draw_the_gap(self, image_path):
    """
    本函数仅用于验证：加载原大图，获取缺口x值，对缺口绘制红框验证
    :param image_path: 大图路径
    """
    # 画图验证：匹配x坐标值 + 滑块起始x坐标值
    draw_x = self.match_template() + self.small_image_x
    ele_image = cv2.imread(image_path)
    # 绘制红色框，以验证
    cv2.rectangle(ele_image,
                  (draw_x, self.small_image_y),
                  (draw_x + self.small_image_width, self.small_image_y + self.small_image_height),
                  (0, 0, 205), 3)
    # 展示
    self.cv_show_image1(ele_image, 'check')
    # 定义文件名与保存
    image_path_base = os.path.basename(image_path)
    check_ele_image = os.path.join(self.image_save_path, 'check_' + image_path_base)
    cv2.imwrite(check_ele_image, ele_image)

来源：dahongmeng

Python