Python使用OCR识别中英文-CFANZ编程社区

Python使用OCR识别中英文

环境

Python 3.6.6
MacOS 10.14.6
pip 19.0.1
mac版的tesseract 4.1.0
pip的tesseract 0.3.0

安装

1.安装python的OCR库

pip install pytesseract

2.在MacOS的终端上安装tesseract，命令：

brew install tesseract

3.下载OCR语言模型比如：中文是chi_sim.traineddata 文件，下载后，复制到该目录下

4.查看该tesseract所下载后支持的所有的可用语言

tesseract --list-langs

Python使用OCR识别中英文_tesseract_02

使用

from PIL import Image
import pytesseract

resDict = pytesseract.image_to_boxes(Image.open('images/example3.png'), lang='chi_sim')
print(resDict)

识别结果是：

Python使用OCR识别中英文_Image_03

原图片是：

Python使用OCR识别中英文_Python_04

另一个测试案例

原图

Python使用OCR识别中英文_python_05

代码不变，修改图片名，结果是：

Python使用OCR识别中英文_Python_06

错的离谱！！在来一个实例

原图片：

Python使用OCR识别中英文_Python_07

代码：

from PIL import Image
import pytesseract

resDict = pytesseract.image_to_boxes(Image.open('images/example4.png'), lang="eng")
arrLetters = resDict.split("\n")
sentence = ""
for letters in arrLetters: 
    sentence += letters.split(" ")[0]
print(sentence)

识别结果是

Python使用OCR识别中英文_python_08