文章详情页

Python 实现任意区域文字识别(OCR)操作

浏览：5日期：2022-06-25 17:38:24

本文的OCR当然不是自己从头开发的，是基于百度智能云提供的API（我感觉是百度在中国的人工智能领域值得称赞的一大贡献），其提供的API完全可以满足个人使用，相对来说简洁准确率高。

安装OCR Python SDK

OCR Python SDK目录结构

├── README.md├── aip //SDK目录│ ├── __init__.py //导出类│ ├── base.py //aip基类│ ├── http.py //http请求│ └── ocr.py //OCR└── setup.py //setuptools安装

支持Python版本：2.7.+ ,3.+

安装使用Python SDK有如下方式：

如果已安装pip，执行pip install baidu-aip即可。

如果已安装setuptools，下载后执行python setup.py install即可。

代码实现

下面让我们来看一下代码实现。

主要使用的模块有

import os # 操作系统相关import sys # 系统相关import time # 时间获取import signal # 系统信号import winsound # 提示音from aip import AipOcr # 百度OCR APIfrom PIL import ImageGrab # 捕获剪切板中的图片import win32clipboard as wc # WINDOWS 剪切板操作import win32con # 这里用于获取 WINDOWS 剪贴板数据的标准格式

第一步这里的APP_ID,API_KEY,SECRET_KEY是通过登陆百度智能云后自己在OCR板块申请的, 实现基本的OCR程序，可以通过图片获取文字。

''' 你的 APPID AK SK '''APP_ID = ’xxx’API_KEY = ’xxx’SECRET_KEY = ’xxx’client = AipOcr(APP_ID, API_KEY, SECRET_KEY)''' 读取图片 '''def get_file_content(filePath): with open(filePath, ’rb’) as fp: return fp.read()''' 从API的返回字典中获取文字 '''def getOcrText(txt_dict): txt = '' if type(txt_dict) == dict: for i in txt_dict[’words_result’]: txt = txt + i['words'] if len(i['words']) < 25: # 这里使用字符串长度决定了文本是否换行，读者可以根据自己的喜好控制回车符的输出，实现可控的文本显示形式 txt = txt + 'nn' return txt''' 调用通用/高精度文字识别, 图片参数为本地图片 '''def BaiduOcr(imageName,Accurate=True): image = get_file_content(imageName) if Accurate: return getOcrText(client.basicGeneral(image)) else: return getOcrText(client.basicAccurate(image)) ''' 带参数调用通用文字识别, 图片参数为远程url图片 '''def BaiduOcrUrl(url): return getOcrText(client.basicGeneralUrl(url))

第二步，实现快捷键获取文字，将识别文字放入剪切板中，提示音提醒以及快捷键退出程序

''' 剪切板操作函数 '''def get_clipboard(): wc.OpenClipboard() txt = wc.GetClipboardData(win32con.CF_UNICODETEXT) wc.CloseClipboard() return txtdef empty_clipboard(): wc.OpenClipboard() wc.EmptyClipboard() wc.CloseClipboard()def set_clipboard(txt): wc.OpenClipboard() wc.EmptyClipboard() wc.SetClipboardData(win32con.CF_UNICODETEXT, txt) wc.CloseClipboard() ''' 截图后,调用通用/高精度文字识别'''def BaiduOcrScreenshots(Accurate=True,path='./',ifauto=False): if not os.path.exists(path): os.makedirs(path) image = ImageGrab.grabclipboard() if image != None: print('rThe image has been obtained. Please wait a moment!',end=' ') filename = str(time.time_ns()) image.save(path+filename+'.png') if Accurate: txt = getOcrText(client.basicAccurate(get_file_content(path+filename+'.png'))) else: txt = getOcrText(client.basicGeneral(get_file_content(path+filename+'.png'))) os.remove(path+filename+'.png') # f = open(os.path.abspath(path)+''+filename+'.txt',’w’) # f.write(txt) set_clipboard(txt) winsound.PlaySound(’SystemAsterisk’,winsound.SND_ASYNC) # os.startfile(os.path.abspath(path)+''+filename+'.txt') # empty_clipboard() return txt else : if not ifauto: print('Please get the screenshots by Shift+Win+S! ',end='') return '' else: print('rPlease get the screenshots by Shift+Win+S ! ',end='')def sig_handler(signum, frame): sys.exit(0) def removeTempFile(file = ['.txt','.png'],path='./'): if not os.path.exists(path): os.makedirs(path) pathDir = os.listdir(path) for i in pathDir: for j in file: if j in i: os.remove(path+i)def AutoOcrFile(path='./',filetype=['.png','.jpg','.bmp']): if not os.path.exists(path): os.makedirs(path) pathDir = os.listdir(path) for i in pathDir: for j in filetype: if j in i: f = open(os.path.abspath(path)+''+str(time.time_ns())+'.txt',’w’) f.write(BaiduOcr(path+i)) breakdef AutoOcrScreenshots(): signal.signal(signal.SIGINT, sig_handler) signal.signal(signal.SIGTERM, sig_handler) print('Waiting For Ctrl+C to exit ater removing all picture files and txt files!') print('Please get the screenshots by Shift+Win+S !',end='') while(1): try: BaiduOcrScreenshots(ifauto=True) time.sleep(0.1) except SystemExit: removeTempFile() break else : pass finally: pass

最终运行函数 AutoOcrScreenshots 函数便可以实现了：

if __name__ == ’__main__’: AutoOcrScreenshots()使用方法

使用 Windows 10 系统时，将以上代码放置在一个 .py 文件下，然后运行便可以使用Shift+Win+S快捷键实现任意区域截取，截取后图片将暂时存放在剪切板中，程序自动使用Windows API获取图片内容，之后使用百度的OCR API获取文字，并将文字放置在剪切版内存中后发出提示音。

使用者则可以在开启程序后，使用快捷键截图后静待提示音后使用Ctrl+V将文字内容放置在自己所需的位置。

补充：Python 中文OCR

有个需求，需要从一张图片中识别出中文，通过python来实现，这种这么高大上的黑科技我们普通人自然搞不了，去github找了一个似乎能满足需求的开源库-tesseract-ocr：

Tesseract的OCR引擎目前已作为开源项目发布在Google Project，其项目主页在这里查看https://github.com/tesseract-ocr，

它支持中文OCR，并提供了一个命令行工具。python中对应的包是pytesseract. 通过这个工具我们可以识别图片上的文字。

笔者的开发环境如下：

macosx

python 3.6

brew

安装tesseract

brew install tesseract

安装python对应的包：pytesseract

pip install pytesseract

Python 实现任意区域文字识别(OCR)操作

怎么用？

如果要识别中文需要下载对应的训练集：https://github.com/tesseract-ocr/tessdata，下载”chi_sim.traineddata”，然后copy到训练数据集的存放路径，如：

Python 实现任意区域文字识别(OCR)操作

具体代码就几行:

#!/usr/bin/env python3# -*- coding: utf-8 -*-import pytesseractfrom PIL import Image# open imageimage = Image.open(’test.png’)code = pytesseract.image_to_string(image, lang=’chi_sim’)print(code)

OCR速度比较慢，大家可以拿一张包含中文的图片试验一下。

以上为个人经验，希望能给大家一个参考，也希望大家多多支持好吧啦网。如有错误或未考虑完全的地方，望不吝赐教。

Python 编程

上一条：Python locust工具使用详解下一条：如何用Python中Tushare包轻松完成股票筛选(详细流程操作)

相关文章：

1. js select支持手动输入功能实现代码2. PHP正则表达式函数preg_replace用法实例分析3. vue使用moment如何将时间戳转为标准日期时间格式4. Android studio 解决logcat无过滤工具栏的操作5. vue-drag-chart 拖动/缩放图表组件的实例代码6. 什么是Python变量作用域7. Android 实现彻底退出自己APP 并杀掉所有相关的进程8. bootstrap select2 动态从后台Ajax动态获取数据的代码9. Android Studio3.6.+ 插件搜索不到终极解决方案(图文详解)10. 一个 2 年 Android 开发者的 18 条忠告

排行榜

					
					vue-drag-chart 拖动/缩放图表组件的实例代码
PHP正则表达式函数preg_replace用法实例分析
一个 2 年 Android 开发者的 18 条忠告
Spring @Primary和@Qualifier注解原理解析
Vue实现仿iPhone悬浮球的示例代码
关于docker部署的jenkins跑git上的程序的问题
js select支持手动输入功能实现代码
JSP标签库介绍
docker版es、milvus、minio启动命令详解
Spring的异常重试框架Spring Retry简单配置操作
Android 实现彻底退出自己APP 并杀掉所有相关的进程
				

热门标签