【Mac】安装 PaddleOCR_userwarning: no ccache found. please be aware that

技术文档

环境：Mac M1 芯片

1、安装

1.1 安装 Anaconda

Anaconda 安装较为简单，直接在 Anaconda 官网下载pkg文件，根据向导提示完成安装。

Anaconda 用于搭建 Python 虚拟环境，目的是为了避免与之前环境安装库的版本冲突，另外 paddle 对Python 的版本也是有要求的。

创建并激活虚拟环境：

zs@Mac ~ % conda create -y -n paddle python=3.12zs@Mac ~ % conda activate paddle

1.2 安装 paddlepaddle

在官网获取安装命令：

(paddle) zs@Mac ~ % conda install paddlepaddle==3.0.0b2 -c paddle

验证：

(paddle) zs@Mac ~ % python>>> import paddle>>> paddle.utils.run_check() Running verify PaddlePaddle program ... I1219 22:02:16.993297 4123495424 interpretercore.cc:237] New Executor is Running.I1219 22:02:17.038717 4123495424 interpreter_util.cc:518] Standalone Executor is Used.PaddlePaddle works well on 1 CPU.PaddlePaddle is installed successfully! Let\'s start deep learning with PaddlePaddle now.

可能报错：

TypeError: __array__(): incompatible function arguments. The following argument types are supported

paddle与numpy的版本不兼容，通过降低numpy版本解决。

1.3 安装 PaddleOCR

安装：

(paddle) zs@Mac ~ % pip install -i https://pypi.tuna.tsinghua.edu.cn/simple paddleocr --user

安装依赖：先将项目依赖拉下来，然后执行以下命令：

(paddle) zs@Mac ~ % pip install -r requirements.txt

2、Paddle 升级

在官网快速开始界面复制命令，直接执行，pip、conda 等会自动处理依赖关系，并安装或升级到指定的版本。

【Mac】安装 PaddleOCR_userwarning: no ccache found. please be aware that

3、测试

3.1 命令行

paddleocr --image_dir /path/image.jpg

3.2 脚本测试

编写脚本 test.py：

from paddleocr import PaddleOCR# 创建识别器ocr = PaddleOCR(use_angle_cls=True, lang=\'ch\')img_path = \'../mv/1.jpg\'# 只需运行一次即可下载模型并将其加载到内存中result = ocr.ocr(img_path, cls=True) for idx in range(len(result)): res = result[idx] for line in res: print(line)

终端执行上述脚本：

(paddle) zs@Mac ~ % python test.py

3.3 警告解决

No ccache found

/opt/anaconda3/envs/paddle/lib/python3.12/site-packages/paddle/utils/cpp_extension/extension_utils.py:686: UserWarning: No ccache found. Please be aware that recompiling all source files may be required. You can download and install ccache from: https://github.com/ccache/ccache/blob/master/doc/INSTALL.md warnings.warn(warning_message)

提示在当前环境中没有找到 ccache。ccache 是一个编译缓存工具，可以显著加快重新编译的速度。如果不介意重新编译所有源文件的时间，可以选择忽略这个警告。如果希望提高编译速度，可以按照提示安装 ccache。

conda install -c conda-forge ccache

Setuptools is replacing distutils

/root/miniconda3/envs/PaddleSpeech/lib/python3.9/site-packages/_distutils_hack/__init__.py:30: UserWarning: Setuptools is replacing distutils. Support for replacing an already imported distutils is deprecated. In the future, this condition will fail. Register concerns at https://github.com/pypa/setuptools/issues/new?template=distutils-deprecation.yml warnings.warn(

这个警告表示 setuptools 正在替换 distutils，并且在未来这种替换可能会失败，setuptools项目中建议通过更新 setuptools 来解决。

python -m pip install --upgrade setuptools

pip is being invoked by an old script wrapper

WARNING: pip is being invoked by an old script wrapper. This will fail in a future version of pip.Please see https://github.com/pypa/pip/issues/5599 for advice on fixing the underlying issue.To avoid this problem you can invoke Python with \'-m pip\' instead of running pip directly.

这个警告表示您正在使用的 pip 是通过一个旧的脚本包装器调用的，这在未来可能会导致问题。建议使用 python -m pip 命令来调用 pip。

5、图片太长导致无法识别

将图片进行裁剪

import osfrom PIL import Imagedef crop_image(path, rows, cols, folder): image = Image.open(path) name, extension = os.path.splitext(os.path.basename(path)) width, height = image.size img_width = width // cols img_height = height // rows for row in range(rows): for col in range(cols): box = (col * img_width, row * img_height, (col + 1) * img_width, (row + 1) * img_height) cropped_image = image.crop(box) output_file = f\"{folder}/{name}_{row}_{col}{extension}\" cropped_image.save(output_file)# 裁剪图像：图片地址、裁剪行数、裁剪列数、裁剪结果保存路径crop_image(\"../photo/example.jpg\", 3, 1, \"../photo\")

8、PaddleOCR模型

模型在本地存放默认地址：/Users/zs/.paddleocr/whl

(paddle) zs@Mac ~ % ls /Users/zs/.paddleocr/whlcls det rec

det（Detection）：
这个文件夹包含用于文本检测的模型。文本检测是 OCR 流程的第一步，它的目的是在图像中找到文本的位置。
cls（Classification）：
这个文件夹包含用于文本方向分类的模型（在某些版本的 PaddleOCR 中可能不存在或不是必需的）。文本方向分类用于确定检测到的文本的方向，以便后续能够正确地识别文本内容。
rec（Recognition）：
这个文件夹包含用于文本识别的模型。文本识别是 OCR 流程的最后一步，它的目的是将检测到的文本图像转换为可编辑的文本内容。

如需更改模型缓存目录，只需设置相应的变量环境即可。

默认下载目录设置环境变量 paddlehub HUB_HOME paddlenlp PPNLP_HOME paddlespeech PPSPEECH_HOME paddleaudio PPAUDIO_HOME paddleocr PPOCR_HOME paddledetection PPDETECTION_HOME paddlegan PPGAN_HOME paddleseg PPSEG_HOME paddleclas PPCLAS_HOME paddlerec PPREC_HOME

9、PaddleOCR模型推理参数

在使用PaddleOCR进行模型推理时，可以自定义修改参数，来修改模型、数据、预处理、后处理等内容，详细的参数解释如下所示。

9.1 全局信息

参数名称类型默认值含义 image_dir str 无，必须显式指定图像或者文件夹路径 page_num int 0 当输入类型为pdf文件时有效，指定预测前面page_num页，默认预测所有页 vis_font_path str “./doc/fonts/simfang.ttf” 用于可视化的字体路径 drop_score float 0.5 识别得分小于该值的结果会被丢弃，不会作为返回结果 use_pdserving bool False 是否使用Paddle Serving进行预测 warmup bool False 是否开启warmup，在统计预测耗时的时候，可以使用这种方法 draw_img_save_dir str “./inference_results” 系统串联预测OCR结果的保存文件夹 save_crop_res bool False 是否保存OCR的识别文本图像 crop_res_save_dir str “./output” 保存OCR识别出来的文本图像路径 use_mp bool False 是否开启多进程预测 total_process_num int 6 开启的进程数，use_mp为True时生效 process_id int 0 当前进程的id号，无需自己修改 benchmark bool False 是否开启benchmark，对预测速度、显存占用等进行统计 save_log_path str “./log_output/” 开启benchmark时，日志结果的保存文件夹 show_log bool True 是否显示预测中的日志信息 use_onnx bool False 是否开启onnx预测

9.2 预测引擎相关

参数名称类型默认值含义 use_gpu bool True 是否使用GPU进行预测 ir_optim bool True 是否对计算图进行分析与优化，开启后可以加速预测过程 use_tensorrt bool False 是否开启tensorrt min_subgraph_size int 15 tensorrt中最小子图size，当子图的size大于该值时，才会尝试对该子图使用trt engine计算 precision str fp32 预测的精度，支持fp32, fp16, int8 3种输入 enable_mkldnn bool True 是否开启mkldnn cpu_threads int 10 开启mkldnn时，cpu预测的线程数

9.3 文本检测模型相关

参数名称类型默认值含义 det_algorithm str “DB” 文本检测算法名称，目前支持DB, EAST, SAST, PSE, DB++, FCE det_model_dir str xx 检测inference模型路径 det_limit_side_len int 960 检测的图像边长限制 det_limit_type str “max” 检测的边长限制类型，目前支持min和max，min表示保证图像最短边不小于det_limit_side_len，max表示保证图像最长边不大于det_limit_side_len

DB算法相关参数如下:

参数名称类型默认值含义 det_db_thresh float 0.3 DB输出的概率图中，得分大于该阈值的像素点才会被认为是文字像素点 det_db_box_thresh float 0.6 检测结果边框内，所有像素点的平均得分大于该阈值时，该结果会被认为是文字区域 det_db_unclip_ratio float 1.5 Vatti clipping算法的扩张系数，使用该方法对文字区域进行扩张 max_batch_size int 10 预测的batch size use_dilation bool False 是否对分割结果进行膨胀以获取更优检测效果 det_db_score_mode str “fast” DB的检测结果得分计算方法，支持fast和slow，fast是根据polygon的外接矩形边框内的所有像素计算平均得分，slow是根据原始polygon内的所有像素计算平均得分，计算速度相对较慢一些，但是更加准确一些。

EAST算法相关参数如下:

参数名称类型默认值含义 det_east_score_thresh float 0.8 EAST后处理中score map的阈值 det_east_cover_thresh float 0.1 EAST后处理中文本框的平均得分阈值 det_east_nms_thresh float 0.2 EAST后处理中nms的阈值

SAST算法相关参数如下:

参数名称类型默认值含义 det_sast_score_thresh float 0.5 SAST后处理中的得分阈值 det_sast_nms_thresh float 0.5 SAST后处理中nms的阈值 det_box_type str quad 是否多边形检测，弯曲文本场景（如Total-Text）设置为’poly’

PSE算法相关参数如下:

参数名称类型默认值含义 det_pse_thresh float 0.0 对输出图做二值化的阈值 det_pse_box_thresh float 0.85 对box进行过滤的阈值，低于此阈值的丢弃 det_pse_min_area float 16 box的最小面积，低于此阈值的丢弃 det_box_type str “quad” 返回框的类型，quad:四点坐标，poly: 弯曲文本的所有点坐标 det_pse_scale int 1 输入图像相对于进后处理的图的比例，如640640的图像，网络输出为160160，scale为2的情况下，进后处理的图片shape为320*320。这个值调大可以加快后处理速度，但是会带来精度的下降

9.4 文本识别模型相关

参数名称类型默认值含义 rec_algorithm str “CRNN” 文本识别算法名称，目前支持CRNN, SRN, RARE, NETR, SAR, ViTSTR, ABINet, VisionLAN, SPIN, RobustScanner, SVTR, SVTR_LCNet rec_model_dir str 无，如果使用识别模型，该项是必填项识别inference模型路径 rec_image_shape str “3,48,320” 识别时的图像尺寸 rec_batch_num int 6 识别的batch size max_text_length int 25 识别结果最大长度，在SRN中有效 rec_char_dict_path str “./ppocr/utils/ppocr_keys_v1.txt” 识别的字符字典文件 use_space_char bool True 是否包含空格，如果为True，则会在最后字符字典中补充空格字符

9.5 端到端文本检测与识别模型相关

参数名称类型默认值含义 e2e_algorithm str “PGNet” 端到端算法名称，目前支持PGNet e2e_model_dir str 无，如果使用端到端模型，该项是必填项端到端模型inference模型路径 e2e_limit_side_len int 768 端到端的输入图像边长限制 e2e_limit_type str “max” 端到端的边长限制类型，目前支持min, max，min表示保证图像最短边不小于e2e_limit_side_len，max表示保证图像最长边不大于e2e_limit_side_len e2e_pgnet_score_thresh float 0.5 端到端得分阈值，小于该阈值的结果会被丢弃 e2e_char_dict_path str “./ppocr/utils/ic15_dict.txt” 识别的字典文件路径 e2e_pgnet_valid_set str “totaltext” 验证集名称，目前支持totaltext, partvgg，不同数据集对应的后处理方式不同，与训练过程保持一致即可 e2e_pgnet_mode str “fast” PGNet的检测结果得分计算方法，支持fast和slow，fast是根据polygon的外接矩形边框内的所有像素计算平均得分，slow是根据原始polygon内的所有像素计算平均得分，计算速度相对较慢一些，但是更加准确一些。

9.6 方向分类器模型相关

参数名称类型默认值含义 use_angle_cls bool False 是否使用方向分类器 cls_model_dir str 无，如果需要使用，则必须显式指定路径方向分类器inference模型路径 cls_image_shape str “3,48,192” 预测尺度 label_list list [‘0’, ‘180’] class id对应的角度值 cls_batch_num int 6 方向分类器预测的batch size cls_thresh float 0.9 预测阈值，模型预测结果为180度，且得分大于该阈值时，认为最终预测结果为180度，需要翻转

20、资料

Paddle官网
mac m1 m2 安装 paddlepaddle paddleocr库，避坑指南

【Mac】安装 PaddleOCR_userwarning: no ccache found. please be aware that

1、安装

1.1 安装 Anaconda

1.2 安装 paddlepaddle

1.3 安装 PaddleOCR

2、Paddle 升级

3、测试

3.1 命令行

3.2 脚本测试

3.3 警告解决

5、图片太长导致无法识别

8、PaddleOCR模型

9、PaddleOCR模型推理参数

9.1 全局信息

9.2 预测引擎相关

9.3 文本检测模型相关

9.4 文本识别模型相关

9.5 端到端文本检测与识别模型相关

9.6 方向分类器模型相关

20、资料

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

【Mac】安装 PaddleOCR_userwarning: no ccache found. please be aware that

1、安装

1.1 安装 Anaconda

1.2 安装 paddlepaddle

1.3 安装 PaddleOCR

2、Paddle 升级

3、测试

3.1 命令行

3.2 脚本测试

3.3 警告解决

5、图片太长导致无法识别

8、PaddleOCR模型

9、PaddleOCR模型推理参数

9.1 全局信息

9.2 预测引擎相关

9.3 文本检测模型相关

9.4 文本识别模型相关

9.5 端到端文本检测与识别模型相关

9.6 方向分类器模型相关

20、资料

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签