> 技术文档 > 5 分钟上手 Firecrawl

5 分钟上手 Firecrawl


文章目录

    • Firecrawl 是什么?
    • 本地部署
    • 验证
    • mcp安装
    • palyground

🔥 5 分钟上手 Firecrawl

Firecrawl 是什么?

一句话:
开源版的 “最强网页爬虫 + 清洗引擎”
• 自动把任意网页 → 结构化 Markdown / JSON
• 支持递归整站抓取、JS 渲染、PDF 解析、图片 alt 自动生成
• 提供 REST API,LangChain / LlamaIndex 官方集成

官方网站

5 分钟上手 Firecrawl

可以在playground中进行测试

5 分钟上手 Firecrawl

点击Get Code可以获得调用模板代码

# Install with pip install firecrawl-pyimport asynciofrom firecrawl import AsyncFirecrawlAppasync def main(): app = AsyncFirecrawlApp(api_key=\'fc-d7310201c7684ec58408d62fac5d88b2\') response = await app.scrape_url( url=\'https://blog.csdn.net/u012399690/article/details/149668148\', formats= [ \'markdown\' ], only_main_content= True parse_pdf= True, max_age= 14400000 ) print(response)asyncio.run(main())

本地部署

官方提供500 credits免费额度,对于经常需要使用或者隐私要求高的用户可以选择本地部署。

第一步:拉取代码

git clone https://github.com/mendableai/firecrawl.git

第二步:修改配置

cp apps/api/.env.example .env

按需修改,为了简单,可以关闭验证
5 分钟上手 Firecrawl

最小配置

NUM_WORKERS_PER_QUEUE=4PORT=3002HOST=0.0.0.0REDIS_URL=redis://redis:6379REDIS_RATE_LIMIT_URL=redis://redis:6379PLAYWRIGHT_MICROSERVICE_URL=http://playwright-service:3000/htmlUSE_DB_AUTHENTICATION=false

🐳 启动

docker compose build # 第一次拉镜像docker compose up -d # 后台跑

访问:

  • API:http://localhost:3002
  • 队列管理:http://localhost:3002/admin/@/queues

5 分钟上手 Firecrawl

验证

cURL命令,可在终端中快速验证

curl -X POST http://localhost:3002/v0/scrape \\ -H \'Content-Type: application/json\' \\ -d \'{ \"url\": \"https://www.ithome.com/0/871/372.htm\", \"formats\": [ \"markdown\" ], \"onlyMainContent\": true, \"parsePDF\": true, \"maxAge\": 14400000 }\'

返回示例:

{ \"success\": true, \"data\": { \"content\": \"xxx\", \"markdown\": \"xxx\", \"linksOnPage\": [ \"https://www.ithome.com/0/871/372.htm#\", \"https://m.ithome.com/\", ], \"metadata\": { \"ogImage\": \"https://img.ithome.com/m/images/logo.png\", \"language\": \"zh\", \"viewport\": \"width=device-width, initial-scale=1.0, minimum-scale=1.0, maximum-scale=1.0, user-scalable=no\", \"description\": \"智谱发布新一代旗舰模型GLM-4.5,专为智能体应用打造,综合能力达到开源SOTA,实测国内最佳。采用混合专家架构,提供两种模式,高速低成本。API已上线开放平台BigModel.cn,也可在智谱清言和z.ai免费体验。#AI大模型# #智谱GLM4.5#\", \"og:image\": \"https://img.ithome.com/m/images/logo.png\", \"format-detection\": \"telephone=no\", \"keywords\": \"智谱,GLM4.5,智能时代,人工智能\", \"apple-itunes-app\": \"app-id=570610859, app-argument=ithome://news?id=871372&type=news\", \"title\": \"智谱发布新一代旗舰开源模型 GLM-4.5,专为智能体应用打造 - IT之家\", \"apple-mobile-web-app-status-bar-style\": \"white\", \"apple-mobile-web-app-capable\": \"yes\", \"theme-color\": \"#fff\", \"favicon\": \"https://m.ithome.com/favicon.ico\", \"scrapeId\": \"07988df7-f880-4d8e-85ee-c434a2a931c3\", \"sourceURL\": \"https://www.ithome.com/0/871/372.htm\", \"url\": \"https://www.ithome.com/0/871/372.htm\", \"contentType\": \"text/html; charset=utf-8\", \"proxyUsed\": \"basic\", \"pageStatusCode\": 200 } }, \"returnCode\": 200}

示例

5 分钟上手 Firecrawl

mcp安装

我们可以通过mcp客户端,和ai协同工作。以cheery studio为例
复制如下配置,或者在魔搭等mcp广场进行配置,然后一键同步。主要修改API_KEY

{ \"mcpServers\": { \"mcp-server-firecrawl\": { \"command\": \"npx\", \"args\": [\"-y\", \"firecrawl-mcp\"], \"env\": { \"FIRECRAWL_API_KEY\": \"YOUR_API_KEY_HERE\" } } }}

如果需要配置为自建服务

{ \"mcpServers\": { \"mcp-server-firecrawl\": { \"command\": \"npx\", \"args\": [\"-y\", \"firecrawl-mcp\"], \"env\": { \"FIRECRAWL_API_URL\": \"http://localhost:3002\", \"FIRECRAWL_API_KEY\": \"optional-if-you-enable-auth\" } } }}

5 分钟上手 Firecrawl

cherry studio中进行调用

5 分钟上手 Firecrawl

palyground

开源版并没有提供playground,只能进行api或者mcp调用。这里提供一个简单的html页面。
5 分钟上手 Firecrawl

<!DOCTYPE html><html lang=\"zh-CN\"><head> <meta charset=\"UTF-8\" /> <title>Firecrawl 自建可视化 UI</title> <meta name=\"viewport\" content=\"width=device-width,initial-scale=1\" /> <link href=\"https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/css/bootstrap.min.css\" rel=\"stylesheet\" /> <link href=\"https://cdn.jsdelivr.net/npm/bootstrap-icons@1.11.3/font/bootstrap-icons.css\" rel=\"stylesheet\" /> <style> body { padding-top: 70px; background: #f8f9fa; } .card { box-shadow: 0 0.125rem 0.25rem rgba(0, 0, 0, 0.075); } .result-area { max-height: 400px; overflow-y: auto; font-family: SFMono-Regular, Menlo, Monaco, Consolas, \"Liberation Mono\", \"Courier New\", monospace; font-size: 0.8rem; } .config-panel { transition: all 0.3s ease; } .collapse:not(.show) { display: none; } </style></head><body> <nav class=\"navbar navbar-expand navbar-dark bg-primary fixed-top\"> <div class=\"container-fluid\"> <a class=\"navbar-brand fw-bold\" href=\"#\"><i class=\"bi bi-fire\"></i> Firecrawl UI</a> <button class=\"btn btn-outline-light btn-sm\" data-bs-toggle=\"modal\" data-bs-target=\"#configModal\"> <i class=\"bi bi-gear\"></i> 配置 </button> </div> </nav> <div class=\"container\">  <div class=\"card mb-3\"> <div class=\"card-header\"> <ul class=\"nav nav-tabs card-header-tabs\" id=\"mainTabs\" role=\"tablist\"> <li class=\"nav-item\" role=\"presentation\"> <button class=\"nav-link active\" id=\"scrape-tab\" data-bs-toggle=\"tab\" data-bs-target=\"#scrape-pane\"  type=\"button\" role=\"tab\" aria-controls=\"scrape-pane\" aria-selected=\"true\">  📥 单页抓取 </button> </li> <li class=\"nav-item\" role=\"presentation\"> <button class=\"nav-link\" id=\"crawl-tab\" data-bs-toggle=\"tab\" data-bs-target=\"#crawl-pane\" type=\"button\"  role=\"tab\" aria-controls=\"crawl-pane\" aria-selected=\"false\">  🕸️ 整站抓取 </button> </li> </ul> </div> <div class=\"card-body\"> <div class=\"tab-content\" id=\"mainTabContent\">  <div class=\"tab-pane fade show active\" id=\"scrape-pane\" role=\"tabpanel\" aria-labelledby=\"scrape-tab\"> <div class=\"mb-3\">  <label for=\"scrapeUrl\" class=\"form-label\">网页地址</label>  <input type=\"url\" class=\"form-control\" id=\"scrapeUrl\" placeholder=\"https://docs.firecrawl.dev\" />  <div class=\"form-text\">输入要抓取的单个网页地址</div> </div> <button class=\"btn btn-primary\" id=\"scrapeBtn\" onclick=\"handleScrape()\">  <i class=\"bi bi-download\"></i> 立即抓取 </button> </div>  <div class=\"tab-pane fade\" id=\"crawl-pane\" role=\"tabpanel\" aria-labelledby=\"crawl-tab\"> <div class=\"mb-3\">  <label for=\"crawlUrl\" class=\"form-label\">网站地址</label>  <input type=\"url\" class=\"form-control\" id=\"crawlUrl\" placeholder=\"https://docs.firecrawl.dev\" />  <div class=\"form-text\">输入要爬取的网站根地址</div> </div> <div class=\"mb-3\">  <label for=\"maxPages\" class=\"form-label\">最大页数</label>  <input type=\"number\" class=\"form-control\" id=\"maxPages\" placeholder=\"10\" min=\"1\" max=\"100\" value=\"10\" />  <div class=\"form-text\">限制爬取的最大页面数量 (1-100)</div> </div> <button class=\"btn btn-warning\" id=\"crawlBtn\" onclick=\"handleCrawl()\">  <i class=\"bi bi-globe\"></i> 开始爬取 </button> </div> </div> </div> </div>  <div class=\"card mb-3\"> <div class=\"card-header d-flex justify-content-between align-items-center\"> <span>📝 结果预览</span> <button class=\"btn btn-sm btn-outline-secondary d-none\" id=\"copyBtn\" onclick=\"copyResult()\"> <i class=\"bi bi-clipboard\"></i> 复制 </button> </div> <div class=\"card-body\"> <pre class=\"result-area border p-2 bg-light\" id=\"result\">等待结果...</pre> </div> </div> </div>  <div class=\"modal fade\" id=\"configModal\" tabindex=\"-1\" aria-labelledby=\"configModalLabel\" aria-hidden=\"true\"> <div class=\"modal-dialog\"> <div class=\"modal-content\"> <div class=\"modal-header\"> <h5 class=\"modal-title\" id=\"configModalLabel\"> <i class=\"bi bi-gear\"></i> 服务配置 </h5> <button type=\"button\" class=\"btn-close\" data-bs-dismiss=\"modal\" aria-label=\"Close\"></button> </div> <div class=\"modal-body\"> <div class=\"mb-3\"> <label for=\"baseUrl\" class=\"form-label\">Base URL</label> <input type=\"url\" class=\"form-control\" id=\"baseUrl\" placeholder=\"http://localhost:3002\"  value=\"http://localhost:3002\" /> <div class=\"form-text\">Firecrawl 服务的基础地址</div> </div> <div class=\"mb-3\"> <label for=\"apiKey\" class=\"form-label\">API Key</label> <input type=\"password\" class=\"form-control\" id=\"apiKey\" placeholder=\"可选,无鉴权时留空\" /> <div class=\"form-text\">如果服务需要鉴权,请输入 API Key</div> </div> </div> <div class=\"modal-footer\"> <button type=\"button\" class=\"btn btn-secondary\" data-bs-dismiss=\"modal\">取消</button> <button type=\"button\" class=\"btn btn-primary\" onclick=\"saveConfig()\" data-bs-dismiss=\"modal\">保存配置</button> </div> </div> </div> </div> <script src=\"https://cdn.jsdelivr.net/npm/bootstrap@5.3.3/dist/js/bootstrap.bundle.min.js\"></script>\"\" <script> const $ = (id) => document.getElementById(id); const base = () => $(\"baseUrl\").value.replace(/\\/$/, \"\"); const key = () => $(\"apiKey\").value; // 加载保存的配置 document.addEventListener(\'DOMContentLoaded\', function () { loadConfig(); }); function loadConfig() { const savedBaseUrl = localStorage.getItem(\'firecrawl_baseUrl\'); const savedApiKey = localStorage.getItem(\'firecrawl_apiKey\'); if (savedBaseUrl) $(\"baseUrl\").value = savedBaseUrl; if (savedApiKey) $(\"apiKey\").value = savedApiKey; } function saveConfig() { localStorage.setItem(\'firecrawl_baseUrl\', $(\"baseUrl\").value); localStorage.setItem(\'firecrawl_apiKey\', $(\"apiKey\").value); // 显示保存成功提示 const toast = document.createElement(\'div\'); toast.className = \'toast align-items-center text-white bg-success border-0 position-fixed top-0 end-0 m-3\'; toast.style.zIndex = \'9999\'; toast.innerHTML = ` 
配置已保存
`
; document.body.appendChild(toast); const bsToast = new bootstrap.Toast(toast); bsToast.show(); // 3秒后自动移除 setTimeout(() => { if (toast.parentNode) { toast.parentNode.removeChild(toast); } }, 3000); } async function request(path, body) { const headers = { \"Content-Type\": \"application/json\" }; if (key()) headers[\"Authorization\"] = `Bearer ${key()}`; return fetch(`${base()}${path}`, { method: \"POST\", headers, body: JSON.stringify(body), }).then((r) => r.json()); } async function handleScrape() { const url = $(\"scrapeUrl\").value; if (!url) return alert(\"请输入网址\"); const scrapeBtn = $(\"scrapeBtn\"); // 禁用按钮但保持原有样式 scrapeBtn.disabled = true; $(\"result\").textContent = \"抓取中...\"; $(\"copyBtn\").classList.add(\"d-none\"); try { const res = await request(\"/v0/scrape\", { url, pageOptions: { onlyMainContent: true }, }); $(\"result\").textContent = res.data?.markdown || JSON.stringify(res, null, 2); $(\"copyBtn\").classList.remove(\"d-none\"); window.lastResult = res; } catch (error) { $(\"result\").textContent = `抓取失败: ${error.message}`; } finally { // 恢复按钮状态 scrapeBtn.disabled = false; } } async function handleCrawl() { const url = $(\"crawlUrl\").value; const limit = parseInt($(\"maxPages\").value) || 10; if (!url) return alert(\"请输入网址\"); const crawlBtn = $(\"crawlBtn\"); // 禁用按钮但保持原有样式 crawlBtn.disabled = true; $(\"result\").textContent = \"整站爬取中,请稍等...\"; $(\"copyBtn\").classList.add(\"d-none\"); try { const job = await request(\"/v0/crawl\", { url, limit }); if (!job.jobId) { $(\"result\").textContent = JSON.stringify(job, null, 2); crawlBtn.disabled = false; return; } const poll = setInterval(async () => { const headers = { \"Content-Type\": \"application/json\" }; if (key()) headers[\"Authorization\"] = `Bearer ${key()}`; const response = await fetch(`${base()}/v0/crawl/status/${job.jobId}`, { method: \"GET\", headers, }); const status = await response.json(); $(\"result\").textContent = JSON.stringify(status, null, 2); if (status.status === \"completed\") { clearInterval(poll); window.lastResult = status; $(\"copyBtn\").classList.remove(\"d-none\"); crawlBtn.disabled = false; } if (status.status === \"failed\") { clearInterval(poll); crawlBtn.disabled = false; } }, 2000); } catch (error) { $(\"result\").textContent = `爬取失败: ${error.message}`; crawlBtn.disabled = false; } } async function copyResult() { try { const dataStr = JSON.stringify(window.lastResult, null, 2); await navigator.clipboard.writeText(dataStr); // 显示复制成功提示 const toast = document.createElement(\'div\'); toast.className = \'toast align-items-center text-white bg-success border-0 position-fixed top-0 end-0 m-3\'; toast.style.zIndex = \'9999\'; toast.innerHTML = `
结果已复制到剪贴板
`
; document.body.appendChild(toast); const bsToast = new bootstrap.Toast(toast); bsToast.show(); // 3秒后自动移除 setTimeout(() => { if (toast.parentNode) { toast.parentNode.removeChild(toast); } }, 3000); } catch (error) { // 如果剪贴板 API 不可用,使用备用方法 const textArea = document.createElement(\'textarea\'); textArea.value = JSON.stringify(window.lastResult, null, 2); document.body.appendChild(textArea); textArea.select(); document.execCommand(\'copy\'); document.body.removeChild(textArea); alert(\'结果已复制到剪贴板\'); } }
</script></body></html>