基于Python的发票OCR-数字识别的简单实现

目录
大体思路 1
人为预先设好一些数据矩阵之间进行比较 2
处理的准备 2
开始实践 2
环境搭建 3
数据集准备 4
预处理 5
之后我们对其进行二值化 6
寻找数字 9
预设数据 11
2 from PIL import Image 12
3 import numpy as np 12
比较 13
1 # 计算灰度值的平均值 13
开始识别 15
8 # !!! 注意这里截取的是二值化后的图片 15
13 init = -1 15
2 import Levenshtein 3 16
手写字体 20
环境搭建 20
数据集准备 21
预处理 21
35 # 取出一张图片即一整行 26
开始识别 26
看看我写的是啥 29
1 from PIL import Image 2 30
识别结果统计 32
3 statis = {} 4 32
16 # 计算图片的感知哈希 32
数据集 35
算法上的提升 35
从数字到文字 36
源码结构 36
发票 OCR - 数字识别的简单实现
本教程旨在使用简单的操作步骤实现一个简单的发票上的数字视频。
我们不追求识别率和速度，本文转载自http://www.biyezuopin.vip/onews.asp?id=16730目的只是想让大家初步体验一下人工智能和计算机视觉 CV。
大体思路
解析图像转换为灰度图二值化处理
截取到需要的数据使用矩阵存储图片
人为预先设好一些数据矩阵之间进行比较
预测输出
处理的准备
图片大小 1218*788
待识别的区域是固定的，我们只识别右上角部分的数字。
使用 opencv 进行图片的找轮廓等处理， pillow 进行图片的处理。
配合其他的一些库进行更方便的处理，安装请参考下面的教程。
开始实践
从我们上面的已知部分来看，我们的计划似乎是完美的，实际上手操作后，处处都有困难 …不管如何，先做起来再说！

# %%
import numpy as np
import matplotlib.pyplot as plt
import Levenshtein

# %%
def read_idx3(filename):
    with open(filename, 'rb') as fo:
        buf = fo.read()
        
        index = 0
        header = np.frombuffer(buf, '>i', 4, index)
        
        index += header.size * header.itemsize
        data = np.frombuffer(buf, '>B', header[1] * header[2] * header[3], index).reshape(header[1], -1)
        
        return data
    
def read_idx1(filename):
    with open(filename, 'rb') as fo:
        buf = fo.read()
        
        index = 0
        header = np.frombuffer(buf, '>i', 2, index)
        
        index += header.size * header.itemsize
        data = np.frombuffer(buf, '>B', header[1], index)
        
        return data

# %%
train_labels = read_idx1("mnist/train-labels.idx1-ubyte")

train_images = read_idx3("mnist/train-images.idx3-ubyte")

print(train_labels.shape, train_images.shape)

# %%
print(train_images[0])

print(train_labels[0])

# %%
plt.subplot(121)
plt.imshow(train_images[0, :].reshape(28, -1), cmap='gray')
plt.title('train 0')

print(train_labels[0])

# %%
# 获取测试集合

test_labels = read_idx1("mnist/t10k-labels.idx1-ubyte")

test_images = read_idx3("mnist/t10k-images.idx3-ubyte")


# %%
print(test_labels[0])

plt.subplot(122)
plt.imshow(test_images[0, :].reshape(28, -1), cmap='gray')
plt.title('test 0')

# %%
print(test_images.shape)

# 使用测试集 作为预处理

from collections import defaultdict

data = defaultdict(lambda : [])

def sHash(img):
    """感知哈希

    Args:
        img ([type]): 一维 784 的数组

    Returns:
        [str]: 感知哈希
    """
    # 感知 哈希
    hash_val = ''
    avg = img.mean()
    
    for x in range(len(img)):
        if img[x] > avg:
            hash_val += '1'
        else:
            hash_val += '0'
    return hash_val

for i in range(len(test_images)):
    img = test_images[i, :]
    # 感知 哈希
    
    data[test_labels[i]].append(sHash(img))

# %%
# 使用训练集的第一张用来测试

to_test_image = train_images[0, :]

test_hash = sHash(to_test_image)

def recognize_number(to_test_image_sHash:str):
    
    result = [ 0 for i in range(10)]
    
    
    for k,v in data.items():
    # k - 数字  v - 每个数字的所有感知哈希值
    # 遍历所有的哈希并计算值
        for hash_val in v:
            leven_val = Levenshtein.ratio(to_test_image_sHash, hash_val)
            if leven_val > result[k]:
                result[k] = leven_val

    return result



# %%

result = recognize_number(test_hash)
print(max(result))

print(result.index(max(result)))

print(result)


# %%
# 使用我们自己写的图片

from PIL import Image

diy_image = Image.open('MNIST-4.jpg')


diy_arr = np.array(diy_image).flatten()

plt.subplot(122)
plt.imshow(diy_arr.reshape(28, -1), cmap='gray')
plt.title('diy 0')

diy_arr = diy_arr.flatten()
# print(sHash(diy_arr))
r = recognize_number(sHash(diy_arr))
print(max(r))

print(r.index(max(r)))

print(r)


# %%
# 测试结果准确率

statis = {}

for i in range(0, 10):
    statis[i] = {}
    
    statis[i]["correct"] = 0
    statis[i]["all"] = 0

for i in range(100):
    shash_val = sHash(train_images[i, :])
    
    r = recognize_number(shash_val)
    
    real_val = train_labels[i]
    if r.index(max(r)) == real_val:
        statis[real_val]["correct"] += 1
    
    statis[real_val]["all"] += 1



# %%
from icecream import ic



for i in range(10):
    print(i, statis[i]["correct"] / statis[i]["all"])



1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191

在这里插入图片描述

相关阅读:
设计模式9、组合模式 Composite
畅购商城_第13章_微信扫码支付
 南科大计算机系：将开源和企业引入计算机课程教学
 Java日志系列——logback，log4j2使用
 【数字通信原理】第五章基带传输理论
 759页14万字智慧大楼弱电智能化规划设计方案
 【数据库】MySQL的事务特性与隔离级别
 为什么只会编程的程序员无法成为优秀的开发者？
RabbitMQ系列【11】延迟队列
 【Pytorch基础教程32】spark或dl模型部署（MLFlow/ONNX/Runtime/tensorflow serving）
原文地址：https://blog.csdn.net/newlw/article/details/126847071