> 技术文档 > ragflow源码本地启动_ragflow源码启动

ragflow源码本地启动_ragflow源码启动


ragflow windows 纯源码本地启动

环境准备

python环境

windows电脑上,请提前安装python环境,我这里使用的conda,创建了一个ragflow的虚拟环境,python的版本是3.10

注意:ragflow要求python的版本必须是3.10

# 创建虚拟环境conda create -n ragflow python==3.10#切换到虚拟环境conda activate ragflow 

前端环境

电脑上需要提前安装node.js,并且对版本有要求,必须>=18.20.4​,在前端项目中有此限制,如下:

 \"engines\": { \"node\": \">=18.20.4\" }

中间件环境

请提前安装mysql、MinIO、Elasticsearch、Redis

源码启动

拉取源码

git clone https://github.com/infiniflow/ragflow.git

后端启动

初始化

step1:安装 uv。如已经安装,可跳过本步骤

cd ragflow/pipx install uvexport UV_INDEX=https://mirrors.aliyun.com/pypi/simple

step2:安装 Python 依赖

uv sync --python 3.10 --all-extras

注意: 在安装python依赖时,可能会遇到多个包安装问题,详情请见下文

准备中间件

ragflow依赖了多个中间件,有mysql、MinIO、Elasticsearch、Redis,可以提前自行安装,或者使用项目的docker-compose.yml文件安装

注意:

  1. 如果自行安装的,需要修改项目中对应的配置文件 conf/service_conf.yaml
  2. 如果使用项目中的docker-compose.yml​文件安装的,则需要修改对应的host文件,添加如下
127.0.0.1 es01 infinity mysql minio redis
源码启动

step1:修改conf/service_conf.yaml 配置文件中对应的配置

ragflow: host: 0.0.0.0 http_port: 9380mysql: name: \'db_name\' user: \'root\' password: \'**************\' host: \'**************\' port: 3306 max_connections: 100 stale_timeout: 30minio: user: \'*****\' password: \'***************\' host: \'ip:port\'es: hosts: \'http://ip:9200\' username: \'elastic\' password: \'*********\'infinity: uri: \'localhost:23817\' db_name: \'default_db\'redis: db: 0 password: \'***********\' host: \'ip:port\'user_default_llm: factory: \'Tongyi-Qianwen\' api_key: \'sk-**************************\' base_url: \'\' default_models: chat_model: \'qwen-max\' embedding_model: \'BAAI/bge-large-zh-v1.5@BAAI\'

step2:源码启动

bash docker/launch_backend_service.sh

或者

直接执行ragflow_server.py​ 和 task_executor.py​2个文件的main方法(文档解析时需要2个都启动,否则无法监听到任务)

使用main方法启动,就可以直接本地debug 断点调试了

前端启动

安装前端依赖
cd webnpm install

可能会遇到下载失败的问题,请科学上网或者重试几次

前端启动
npm run dev

可能遇到的错误

问题1:pyicu-2.15 包无法下载

具体问题表现如下:

(ragflow-1) PS E:\\python\\workspace\\ragflow> uv sync --python 3.10 --all-extras  Resolved 380 packages in 3ms x Failed to build `pyicu==2.15` |-> The build backend returned an error `-> Call to `setuptools.build_meta.build_wheel` failed (exit code: 1) [stdout] (running \'icu-config --version\') (running \'pkg-config --modversion icu-i18n\') [stderr] Traceback (most recent call last): File \"\", line 89, in  File \"D:\\Conda\\envs\\ragflow\\lib\\os.py\", line 680, in __getitem__ raise KeyError(key) from None KeyError: \'ICU_VERSION\' During handling of the above exception, another exception occurred: Traceback (most recent call last): File \"\", line 92, in  File \"\", line 19, in check_output File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 503, in run with Popen(*popenargs, **kwargs) as process: File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 971, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 1456, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] 系统找不到指定的文件。 During handling of the above exception, another exception occurred: Traceback (most recent call last): File \"\", line 96, in  File \"\", line 19, in check_output File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 421, in check_output return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 503, in run with Popen(*popenargs, **kwargs) as process: File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 971, in __init__ self._execute_child(args, executable, preexec_fn, close_fds, File \"D:\\Conda\\envs\\ragflow\\lib\\subprocess.py\", line 1456, in _execute_child hp, ht, pid, tid = _winapi.CreateProcess(executable, args, FileNotFoundError: [WinError 2] 系统找不到指定的文件。 During handling of the above exception, another exception occurred: Traceback (most recent call last): File \"\", line 14, in  File \"D:\\uv_cache\\builds-v0\\.tmpY9iyBJ\\lib\\site-packages\\setuptools\\build_meta.py\", line 334, in get_requires_for_build_wheel return self._get_build_requires(config_settings, requirements=[]) File \"D:\\uv_cache\\builds-v0\\.tmpY9iyBJ\\lib\\site-packages\\setuptools\\build_meta.py\", line 304, in _get_build_requires self.run_setup() File \"D:\\uv_cache\\builds-v0\\.tmpY9iyBJ\\lib\\site-packages\\setuptools\\build_meta.py\", line 320, in run_setup exec(code, locals()) File \"\", line 99, in  RuntimeError: Please install pkg-config on your system or set the ICU_VERSION environment variable to the version of ICU you have installed. hint: This usually indicates a problem with the package or the build environment. help: `pyicu` (v2.15) was included because `ragflow` (v0.17.2) depends on `pyicu`

解决方案

手动下载pyicu-2.15-cp310-cp310-win_amd64.whl​,手动执行安装

uv pip install ./pyicu-2.15-cp310-cp310-win_amd64.whl# ./pyicu-2.15-cp310-cp310-win_amd64.whl 你本地的路径

pyicu-2.15-cp310-cp310-win_amd64.whl

重新执行拉包命令即可

注意:如果手动安装后,重新拉包还是报错,则需要提前按照 问题2 的解决方案,将Microsoft Visual C++ Build Tools 安装好后,再重新手动安装尝试

问题2:datrie-0.8.2 包无法下载

具体问题表现如下:

Resolved 1 package in 255ms x Failed to build `datrie==0.8.2` |-> The build backend returned an error `-> Call to `setuptools.build_meta:__legacy__.build_wheel` failed (exit code: 1) [stdout] running bdist_wheel running build running build_clib building \'datrie\' library [stderr] D:\\uv_cache\\builds-v0\\.tmpFLhxFF\\lib\\site-packages\\setuptools\\_distutils\\dist.py:289: UserWarning: Unknown distribution option: \'tests_require\' warnings.warn(msg) D:\\uv_cache\\builds-v0\\.tmpFLhxFF\\lib\\site-packages\\setuptools\\dist.py:759: SetuptoolsDeprecationWarning: License classifiers are deprecated. !!  ********************************************************************************  Please consider removing the following classifiers in favor of a SPDX license expression:  License :: OSI Approved :: GNU Lesser General Public License v2 or later (LGPLv2+)  See https://packaging.python.org/en/latest/guides/writing-pyproject-toml/#license for details.  ******************************************************************************** !! self._finalize_license_expression() error: Microsoft Visual C++ 14.0 or greater is required. Get it with \"Microsoft C++ Build Tools\": https://visualstudio.microsoft.com/visual-cpp-build-tools/ hint: This usually indicates a problem with the package or the build environment.

原因分析:

缺少编译工具datrie​ 依赖 C 扩展,而 Windows 默认没有 C/C++ 编译环境

未安装 Microsoft Visual C++ Build Tools:错误信息明确提示需要安装 Microsoft Visual C++ 14.0+​(即 Visual Studio 2015 或更高版本的构建工具)

解决方案:

安装 Microsoft Visual C++ Build Tools下载地址:https://visualstudio.microsoft.com/zh-hans/visual-cpp-build-tools/

ragflow源码本地启动_ragflow源码启动

下载并安装完成后,重启电脑,重新拉取python包即可

问题3:nltk_data​报错的问题

原因是nltk_data相关数据没有找到,解决方案是下载相关数据,下载nltk_data包括wordnet、punkt、punkt_tab,如下所示:

import nltk# 下载nltk_datanltk.download(\'punkt\', download_dir=\'E:/nltk_data\')nltk.download(\'wordnet\', download_dir=\'E:/nltk_data\')nltk.download(\'punkt_tab\', download_dir=\'E:/nltk_data\')# 验证nltk_datanltk.data.path.append(\"E:\\nltk_data\")try: nltk.data.find(\"tokenizers/punkt_tab/english\") print(\"punkt_tab success\") from nltk.tokenize import word_tokenize print(\"test: \", word_tokenize(\"This is a test sentence.\"))except Exception as e: print(\"error: \", e)

下载好数据后,放置到对应的目录即可

问题4:去远程 HuggingFace 下载 embedding_model,链接超时

问题表现如下:

Traceback (most recent call last): File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\requests\\adapters.py\", line 589, in send resp = conn.urlopen( File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\urllib3\\connectionpool.py\", line 841, in urlopen retries = retries.increment( File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\urllib3\\util\\retry.py\", line 519, in increment raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /api/models/BAAI/bge-large-zh-v1.5/revision/main (Caused by ConnectTimeoutError(<urllib3.connection.HTTPSConnection object at 0x000001AAB02011B0>, \'Connection to huggingface.co timed out. (connect timeout=None)\'))During handling of the above exception, another exception occurred:Traceback (most recent call last): File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\huggingface_hub\\_snapshot_download.py\", line 155, in snapshot_download repo_info = api.repo_info(repo_id=repo_id, repo_type=repo_type, revision=revision, token=token) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\huggingface_hub\\utils\\_validators.py\", line 114, in _inner_fn return fn(*args, **kwargs) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\huggingface_hub\\hf_api.py\", line 2682, in repo_info return method( File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\huggingface_hub\\utils\\_validators.py\", line 114, in _inner_fn return fn(*args, **kwargs) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\huggingface_hub\\hf_api.py\", line 2466, in model_info r = get_session().get(path, headers=headers, timeout=timeout, params=params) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\requests\\sessions.py\", line 602, in get return self.request(\"GET\", url, **kwargs) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\requests\\sessions.py\", line 589, in request resp = self.send(prep, **send_kwargs) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\requests\\sessions.py\", line 703, in send r = adapter.send(request, **kwargs) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\huggingface_hub\\utils\\_http.py\", line 93, in send return super().send(request, *args, **kwargs) File \"E:\\python\\workspace\\ragflow\\.venv\\lib\\site-packages\\requests\\adapters.py\", line 610, in send raise ConnectTimeout(e, request=request)requests.exceptions.ConnectTimeout: (MaxRetryError(\"HTTPSConnectionPool(host=\'huggingface.co\', port=443): Max retries exceeded with url: /api/models/BAAI/bge-large-zh-v1.5/revision/main (Caused by ConnectTimeoutError(, \'Connection to huggingface.co timed out. (connect timeout=None)\'))\"), \'(Request ID: 174e8577-922b-4134-9929-35877f8b9522)\')

解决方案1:

可手动下载对应的 embedding_model ,并放置到 c盘 用户目录下的.ragflow下

如果想要修改embedding_model 放置的目录,则需要修改对应的路径逻辑:def get_home_cache_dir()​,具体使用位置为

rag.llm.embedding_model.DefaultEmbedding def __init__(self, key, model_name, **kwargs): \"\"\" If you have trouble downloading HuggingFace models, -_^ this might help!! For Linux: export HF_ENDPOINT=https://hf-mirror.com For Windows: Good luck ^_- \"\"\" if not settings.LIGHTEN: with DefaultEmbedding._model_lock: from FlagEmbedding import FlagModel import torch if not DefaultEmbedding._model or model_name != DefaultEmbedding._model_name:  try: DefaultEmbedding._model = FlagModel(os.path.join(get_home_cache_dir(), re.sub(r\"^[a-zA-Z0-9]+/\", \"\", model_name)), query_instruction_for_retrieval=\"为这个句子生成表示以用于检索相关文章:\", use_fp16=torch.cuda.is_available()) DefaultEmbedding._model_name = model_name  except Exception: model_dir = snapshot_download(repo_id=\"BAAI/bge-large-zh-v1.5\", local_dir=os.path.join(get_home_cache_dir(), re.sub(r\"^[a-zA-Z0-9]+/\", \"\", model_name)), local_dir_use_symlinks=False) DefaultEmbedding._model = FlagModel(model_dir, query_instruction_for_retrieval=\"为这个句子生成表示以用于检索相关文章:\", use_fp16=torch.cuda.is_available()) self._model = DefaultEmbedding._model self._model_name = DefaultEmbedding._model_name

则需要修改def get_home_cache_dir()​的逻辑

# 原来的逻辑,是去用户目录下获取def get_home_cache_dir(): dir = os.path.join(os.path.expanduser(\'~\'), \".ragflow\") try: os.mkdir(dir) except OSError: pass return dir# 修改后的逻辑,是去项目的根目录下的models 目录下获取def get_home_cache_dir(): # 获取当前文件的绝对路径 current_file_path = os.path.abspath(__file__) # 根据项目结构向上追溯至根目录(示例:3级) project_root = os.path.dirname(os.path.dirname(os.path.dirname(current_file_path))) # 拼接model目录路径 model_dir = os.path.join(project_root, \"models\") # 创建目录(如果不存在) os.makedirs(model_dir, exist_ok=True) return model_dir

解决方案2:

电脑科学上网后,在项目启动时,会自动取下载

解决方案3:

把环境变量 HF_ENDPOINT​ 设成相应的镜像站点

export HF_ENDPOINT=https://hf-mirror.com

三种方案,任选其中一个尝试

附件:

通过网盘分享的文件:pyicu-2.15-cp310-cp310-win_amd64.whl
链接: https://pan.baidu.com/s/17SYBqTwSF7VOH0FARbtmnA 提取码: jcrq
–来自百度网盘超级会员v5的分享