最近尝试着弄了个Python的简单验证码识别脚本。
其中，主要用的是Tesseract这个开源的OCR软件。

0x00

首先，我们要安装Tesseract。

Linux

sudo apt-get install tesseract-ocr

Windows

下载3rd party Windows exe's/installer，并安装。
然后去下载训练好的tessdata
将tessdata解压到和exe文件同个文件夹下的/tessdata目录即可。

0x01

简单使用Tesseract

在命令行下，直接

./tesseract.exe xxx.png stdout

这样就可以在命令行输出识别的内容。

其实tesseract内置了多种识别模式。--help就可以看到
默认识别英语，若是要识别其他语言，就需要-l lang制定语种，并提前下好相关的语料库。

./tesseract.exe xxx.png stdout -psm 7 -l eng

上面命令：使用7模式，语言设置为英语

0x02

安装需要的Python库

这里我使用的是Python3，和Python2有所区别，请自行找bug。

pip install PIL Requests

获取验证码

一般现在验证码都和每个人的cookies绑定。
通过burp抓包，可以发现在验证码更新的时候会用cookies去get一个网址，然后返回的就是验证码图片。

r=requests.get("http://xxxxx.cn/CheckCode.aspx",cookies={"xxxxxxx":"xxxxxx"})

如此，就得到了验证码图片的bytes
接下来我们需要将其用Python的PIL打开，处理。

from PIL import Image
from io import BytesIO
i=Image.open(BytesIO(r.content))

这里就将返回的数据包转化为了图像格式，接下来是对图像的处理。
首先，将其二值化，并输出保存

im = i.convert('L')
threshold = 140
table = []
for i in range(256):
    if i < threshold:
        table.append(0)
    else:
        table.append(1)
out = im.point(table, '1')
out.save("code.png")

去除噪点

def getPixel(image,x,y,G,N):  
    L = image.getpixel((x,y))  
    if L > G:  
        L = True  
    else:  
        L = False  

    nearDots = 0  
    if L == (image.getpixel((x - 1,y - 1)) > G):  
        nearDots += 1  
    if L == (image.getpixel((x - 1,y)) > G):  
        nearDots += 1  
    if L == (image.getpixel((x - 1,y + 1)) > G):  
        nearDots += 1  
    if L == (image.getpixel((x,y - 1)) > G):  
        nearDots += 1  
    if L == (image.getpixel((x,y + 1)) > G):  
        nearDots += 1  
    if L == (image.getpixel((x + 1,y - 1)) > G):  
        nearDots += 1  
    if L == (image.getpixel((x + 1,y)) > G):  
        nearDots += 1  
    if L == (image.getpixel((x + 1,y + 1)) > G):  
        nearDots += 1  

    if nearDots < N:  
        return image.getpixel((x,y-1))  
    else:  
        return None  

def clearNoise(image,G,N,Z):  
    draw = ImageDraw.Draw(image)  

    for i in range(0,Z):  
        for x in range(1,image.size[0] - 1):  
            for y in range(1,image.size[1] - 1):  
                color = getPixel(image,x,y,G,N)  
                if color != None:  
                    draw.point((x,y),color)

def Clean():
    image=Image.open('code.png')
    clearNoise(image,200,2,1)
    image.save('code.png')

# 降噪   
# 根据一个点A的RGB值，与周围的8个点的RBG值比较，设定一个值N（0

本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可