> 技术文档 > 前端渲染pdf文件解决方案-pdf.js

前端渲染pdf文件解决方案-pdf.js

目录

一、前言

二、简介

1、pdf.js介绍

2、插件版本参数

三、通过viewer.html实现预览(推荐)

1、介绍

2、部署

【1】下载插件包

【2】客户端方式

【3】服务端方式(待验证)

3、使用方法

【1】预览PDF文件

【2】外部搜索条件触发pdf.js的搜索逻辑

四、把pdf渲染为canvas实现预览

1、安装

2、功能实现

【1】实现pdf预览

【2】实现pdf内容文本可选进行复制

【3】实现搜索,匹配内容高亮,并且可以跳转至匹配内容的位置

【4】获取pdf文件中目录的数据结构


一、前言

        在前端开发中,渲染PDF文件一直是一项重要而挑战性的任务。而今,我们可以借助pdf.js库来轻松实现这一目标。pdf.js是一个开源的JavaScript库,它可以在浏览器中渲染PDF文件,实现了在网页上查看PDF文档的功能。它提供了丰富的API和功能,使得在前端页面展示PDF文件变得轻而易举。让我们一起探索pdf.js的奇妙之处,轻松实现前端PDF文件的渲染与展示吧!

二、简介

1、pdf.js介绍

        pdf.js是一款基于JavaScript的开源PDF阅读器组件,可以在网页中直接显示和操作PDF文件,目前已知的前端渲染pdf组件都是基于pdf.js进行封装。

git地址:https://github.com/mozilla/pdf.js

注:开源且免费

2、插件版本参数

插件 版本 Node v22.13.0

@types/react

^18.0.33

@types/react-dom

^18.0.11 pdfjs-2.5.207-es5-dist.zip (viewer.js使用方式) 2.5.207 pdfjs-dist   (canvas渲染方式)

3.6.172

三、通过viewer.html实现预览(推荐)

1、介绍

        除了PDF预览,还待配套的工具栏,支持功搜索、缩放、目录、打印等功能~ 

Demo如图:

2、部署

【1】下载插件包

下载地址:https://github.com/mozilla/pdf.js/releases/tag/v2.5.207

【2】客户端方式

       把下载后的pdfjs-2.5.207-es5-dist.zip解压后,放在项目中的public文件夹下

【3】服务端方式

    pdf.js包仍然放在public目录下(或者服务端目录下)

const pdfServerUrl = \'/pdfjs-2.5.207-es5-dist/web/viewer.html\'

     一个可以获取pdf文件二进制流的地址(需要同源

....const pdfInfoUrl = `${location.origin}/xxx/xx.pdf`;const url = `${pdfServerUrl}?file=${encodeURIComponent(pdfInfoUrl)}`...

3、使用方法

【1】预览PDF文件

        1)客户端方式(基于React框架为例)

const viewPDF: React.FC = () => { // pdf文件路径,放在项目的public目录下 const pdfUrl = \'/A.pdf\'; //pdf.js库的代码,放在项目的public目录下 const pdfServerUrl = \'/pdfjs-2.5.207-es5-dist/web/viewer.html\' const url = `${pdfServerUrl}?file=${pdfUrl}` return  

pdf 搜索(基于pdf-dist,pdf_viewer.html)

;}

        2)服务端方式

        通过axios接口获取文件的arraybuffer,再把arraybuffer转换成二进制Blob,最后把Bolb转成blob:url传给viewer.html

...import axios from \'axios\';...const [pdfUrl, setPdfUrl] = useState(\'\');const getPDFViewUrl = (fileName: any) => { axios({ method: \'get\', url: `URL`,//文件下载的url responseType: \'arraybuffer\' }).then(response =>{ const blob = new Blob([response.data], { type: \'application/pdf\' }); const blobUrl = URL.createObjectURL(blob); console.log(\'zyk===>\', blobUrl) setPdfUrl(`${pdfServerUrl}?file=${blobUrl}`) }) };return (   );

【2】外部搜索条件触发pdf.js的搜索逻辑

  • 跳转至第一个匹配的内容
  • 匹配内容高亮
const viewPDF: React.FC = () => { // pdf文件路径,放在项目的public目录下 const pdfUrl = \'/A.pdf\'; //pdf.js库的代码,放在项目的public目录下 const pdfServerUrl = \'/pdfjs-2.5.207-es5-dist/web/viewer.html\' const url = `${pdfServerUrl}?file=${pdfUrl}` let pdfContentWindow: any = null; //缓存iframContent const getPdfContent = () => { const pdfFrame: any = document.getElementById(\'pdfIframe\'); if (!pdfFrame) { return; } pdfContentWindow = pdfFrame.contentWindow; //pdf组件部分信息,包括:当前页码、总共页码等 console.log(\'page===>\', pdfContentWindow.PDFViewerApplication); } const onSearchForOut = (searchText: string) => { pdfContentWindow.postMessage(searchText, \'*\'); pdfContentWindow.addEventListener(\'message\', (e: any) => {  // 高亮匹配结果 pdfContentWindow.PDFViewerApplication.findBar.findField.value = e.data; pdfContentWindow.PDFViewerApplication.findBar.highlightAll.checked = true; pdfContentWindow.PDFViewerApplication.findBar.dispatchEvent(\'highlightallchange\'); //触发搜索项‘下一个’事件 pdfContentWindow.PDFViewerApplication.findBar.dispatchEvent(\'again\', false); }, false); } useEffect(() => { getPdfContent(); setTimeout(() => { // 外部的搜索条件 onSearchForOut(\'阳区CBD核心区\') }, 3* 1000) }, []); return  

pdf 搜索(基于pdf-dist,pdf_viewer.html)

;}

四、把pdf渲染为canvas实现预览

1、安装

npm install pdfjs-dist --save

2、功能实现

【1】实现pdf预览

import { Button } from \'antd\';import { useState, useEffect, useRef } from \'react\';import * as pdfjsLib from \'pdfjs-dist\'; // 引入pdfjs-distconst pdfUrl = \'/zyk.pdf\'; // pdf 文件路径,pdf文件存放于public目录下const workerUrl = `/pdf.worker.min.js`; //webworker存放于public目录下pdfjsLib.GlobalWorkerOptions.workerSrc = workerUrl;const viewPdf = (props: {height: string}) => { const {height} = props; const pdfContainerRef = useRef(null); const [pagesList, setPagesList] = useState([]); const scale = 2; // 缩放比例 // 渲染单个页面 const renderPage = async (page: any, pageNumber: number) => { const viewport = page.getViewport({ scale }); const pageContentDom = document.createElement(\'div\'); pageContentDom.id = `pdfPage-content-${pageNumber}`; pageContentDom.style.width = `${viewport.width}px`; pageContentDom.style.height = `${viewport.height}px`; pageContentDom.style.position = \'relative\'; // 创建 Canvas 元素 const canvas = document.createElement(\'canvas\'); const context = canvas.getContext(\'2d\'); canvas.id=`pdfPage-${pageNumber}` canvas.width = viewport.width; canvas.height = viewport.height; canvas.style.border = \'1px solid black\'; pageContentDom.appendChild(canvas); pdfContainerRef.current.appendChild(pageContentDom); // 渲染 PDF 页面到 Canvas await page.render({ canvasContext: context, viewport, }).promise; }; // 渲染 PDF 页面 const renderPagesGroup = ( pages: any) => { pages.forEach(({page}:any, index: number) => { renderPage(page, index); }); }; // 加载 PDF 文件 const loadPdf = async (url: any) => { const pdf = await pdfjsLib.getDocument(url).promise; const pages: any[] = []; for (let i = 1; i  { loadPdf(pdfUrl); }, []); return  

PDF 搜索(基于@pdfjs-dist-自定义实现)

<div style={{ height: height || \'500px\' }}> {/* PDF 容器 */} <div ref={pdfContainerRef} style={{ position: \'relative\', height: \'100%\', overflowY: \'scroll\' }} />
};export default viewPdf;

【2】实现pdf内容文本可选进行复制

...//基于“【1】实现pdf预览”代码, 修改renderPage方法// 渲染单个页面 const renderPage = async (page: any, pageNumber: number) => { const viewport = page.getViewport({ scale }); const pageContentDom = document.createElement(\'div\'); pageContentDom.id = `pdfPage-content-${pageNumber}`; //add-begin: 文本可选则 为了文本层和canvas层重叠,利用组件库的类名(类名不能修改) pageContentDom.className = \'pdfViewer\'; pageContentDom.style.setProperty(\'--scale-factor\', scale as any); //add-end: 文本可选则 pageContentDom.style.width = `${viewport.width}px`; pageContentDom.style.height = `${viewport.height}px`; pageContentDom.style.position = \'relative\'; // 创建 Canvas 元素 const canvas = document.createElement(\'canvas\'); const context = canvas.getContext(\'2d\'); canvas.id=`pdfPage-${pageNumber}` canvas.width = viewport.width; canvas.height = viewport.height; canvas.style.border = \'1px solid black\'; pageContentDom.appendChild(canvas); createHeightLightCanvas(viewport, pageNumber, pageContentDom); pdfContainerRef.current.appendChild(pageContentDom); // 渲染 PDF 页面到 Canvas await page.render({ canvasContext: context, viewport, }).promise; //add-begin: 文本可选则 const textLayerDiv = document.createElement(\'div\'); textLayerDiv.style.width = viewport.width; textLayerDiv.style.height = viewport.height; //为了文本层和canvas层重叠,利用组件库的类名 textLayerDiv.className = \'textLayer\'; const textContent = await page.getTextContent(); pdfjsLib.renderTextLayer({ textContentSource: textContent, container: textLayerDiv, viewport: viewport, textDivs: [], }); pageContentDom.appendChild(textLayerDiv); //add-end: 文本可选则 };

【3】实现搜索,匹配内容高亮,并且可以跳转至匹配内容的位置

import { Button } from \'antd\';import { useState, useEffect, useRef } from \'react\';import * as pdfjsLib from \'pdfjs-dist\'; // 引入pdfjs-distconst pdfUrl = \'/zyk.pdf\'; // pdf 文件路径,pdf文件存放于public目录下const workerUrl = `/pdf.worker.min.js`; //webworker存放于public目录下pdfjsLib.GlobalWorkerOptions.workerSrc = workerUrl;const viewPdf = (props: {height: string}) => { const {height} = props; const [searchText, setSearchText] = useState(\'\'); const pdfContainerRef = useRef(null); const [pagesList, setPagesList] = useState([]); const [matchList, setMatchList] = useState([]); const scale = 2; // 缩放比例 const createHeightLightCanvas = (viewport: any, pageNumber: number, parentDom: any) => { // 为每页创建一个高亮层canvas const highlightCanvas = document.createElement(\'canvas\'); highlightCanvas.id = `highlightCanvas-${pageNumber}`; highlightCanvas.className = \'highlightCanvas\'; highlightCanvas.width = viewport.width; highlightCanvas.height = viewport.height; highlightCanvas.style.position = \'absolute\'; highlightCanvas.style.top = \'0\'; highlightCanvas.style.left = \'0\'; highlightCanvas.style.zIndex = \'1\'; parentDom.appendChild(highlightCanvas); } // pageNumber 页码(从0开始) const jumpToPage = (pageIndex: number) => { let beforeCanvasHeight = 0; for (let i = 0; i  { // pdfjs 坐标系原点在左下角。transform[5]代表y轴的基线,所以需要减去高度 const {textBlock} = match; const { transform, height } = textBlock; return canvas.height - (transform[5] + height -2) * scale; } // 滚动到指定的匹配项 const scrollToMatch = (match: any) => { const { pageIndex, matchList } = match; const firstMatchContent = matchList[0]; // 获取滚动区域的高度 const scrollHeight = pdfContainerRef.current.scrollHeight; console.log(\'滚动区域的高度:\', scrollHeight); // 获取当前页码之前dom的高度 let beforePageHeight = 0; for (let i = 0; i  { if (matchesList.length === 0) { return; } const context = canvas.getContext(\'2d\'); context.fillStyle = \'rgba(255, 255, 0, 0.5)\'; // 黄色半透明填充 matchesList.forEach((match: any) => { const {textBlock} = match; const { transform, width, height, str } = textBlock; // 获取每一个字符的宽度 const charWidth = width / str.length; const lightWidth = (match.textEndIndex - match.textStartIndex) * charWidth; const lightHeight = height; const x = transform[4] + match.textStartIndex * charWidth; const y = getCurrentTextContentY(canvas, match); context.fillRect( Math.floor(x * scale), Math.floor(y), Math.ceil(lightWidth * scale), Math.ceil(lightHeight * scale) ); }); }; // 渲染单个页面 const renderPage = async (page: any, pageNumber: number) => { const viewport = page.getViewport({ scale }); const pageContentDom = document.createElement(\'div\'); pageContentDom.id = `pdfPage-content-${pageNumber}`; //为了文本层和canvas层重叠,利用组件库的类名 pageContentDom.className = \'pdfViewer\'; pageContentDom.style.setProperty(\'--scale-factor\', scale as any); pageContentDom.style.width = `${viewport.width}px`; pageContentDom.style.height = `${viewport.height}px`; pageContentDom.style.position = \'relative\'; // 创建 Canvas 元素 const canvas = document.createElement(\'canvas\'); const context = canvas.getContext(\'2d\'); canvas.id=`pdfPage-${pageNumber}` canvas.width = viewport.width; canvas.height = viewport.height; canvas.style.border = \'1px solid black\'; pageContentDom.appendChild(canvas); createHeightLightCanvas(viewport, pageNumber, pageContentDom); pdfContainerRef.current.appendChild(pageContentDom); // 渲染 PDF 页面到 Canvas await page.render({ canvasContext: context, viewport, }).promise; // 渲染文本框 const textLayerDiv = document.createElement(\'div\'); textLayerDiv.style.width = viewport.width; textLayerDiv.style.height = viewport.height; //为了文本层和canvas层重叠,利用组件库的类名 textLayerDiv.className = \'textLayer\'; const textContent = await page.getTextContent(); pdfjsLib.renderTextLayer({ textContentSource: textContent, container: textLayerDiv, viewport: viewport, textDivs: [], }); pageContentDom.appendChild(textLayerDiv) }; // 渲染 PDF 页面 const renderPagesGroup = ( pages: any) => { pages.forEach(({page}:any, index: number) => { renderPage(page, index); }); }; // 加载 PDF 文件 const loadPdf = async (url: any) => { const pdf = await pdfjsLib.getDocument(url).promise; const pages: any[] = []; for (let i = 1; i  { // 创建正则表达式对象 const regex = new RegExp(pattern, \'g\'); // 使用match方法找到所有匹配项 const matches = text.match(regex); // 如果没有匹配项,返回空数组 if (!matches) { return []; } // 创建一个数组来存储所有匹配的位置 const positions = []; // 遍历所有匹配项,找到它们在字符串中的位置 let match; while ((match = regex.exec(text)) !== null) { positions.push(match.index); } return positions; } // todo 优化参数个数, const getMatchesList = ( items: any, currentItem: any, currentItemIndex: number, currentTextIndex: number, searchStr: string): MatchBlockItem[] => { let matchSearchList: MatchBlockItem[] = []; if(currentItem.str.length - (currentTextIndex + 1) < searchStr.length -1 ) { // 获取当前文本块中剩余字符,如果小于搜索字符长度,则继续查找下一个文本块 let itemText = currentItem.str.slice(currentTextIndex); // 获取当前文本块中剩余字符 let tempMatchSearchList = [{ blockIndex: currentItemIndex, textStartIndex: currentTextIndex, textEndIndex: currentItem.str.length,// 由于统一使用slice截取,所以不包括最后一位 textBlock: currentItem }]; // 存储后续文本块 let index = currentItemIndex; const otherSearchLength = searchStr.length -1; while (itemText.length  currentOtherSearchLength) { // 文本块的长度大于剩余搜索字符长度,则截取剩余搜索字符长度的字符 itemText = `${itemText}${items[index].str.slice(0, currentOtherSearchLength+1)}`; tempMatchSearchList.push({ blockIndex: index, textStartIndex: 0, textEndIndex: currentOtherSearchLength + 1, textBlock: items[index] }) } else { // 文本块的长度小于剩余搜索字符长度,则截取全部字符, 继续 itemText = `${itemText}${items[index].str}`; tempMatchSearchList.push({ blockIndex: index, textStartIndex: 0, textEndIndex: items[index].str.length, textBlock: items[index] }) } } if (itemText === searchStr) { matchSearchList = matchSearchList.concat(tempMatchSearchList); } } else { // 获取当前文本块中剩余字符,如果大于等于搜索字符长度,则截取当前文本块中搜索文本长度的字符 const textEndIndex = currentTextIndex + searchStr.length; const text = currentItem.str.slice(currentTextIndex, textEndIndex); // 取出匹配字符所在文本块及后续文本块 if (text === searchStr) { console.log(\'匹配到了:\', currentItem, currentItemIndex) matchSearchList.push({ blockIndex: currentItemIndex, textStartIndex: currentTextIndex, textEndIndex: textEndIndex, textBlock: currentItem }) } } return matchSearchList; } // 查找文本的所有出现位置 const findAllOccurrences = (items: any, searchStr: string): MatchBlockItem[] => { const firstSearchStr = searchStr[0]; let matchSearchList: MatchBlockItem[] = []; for(let i=0; i 0) { for(let j=0; j { const highlightCanvases = Array.from(pdfContainerRef.current.querySelectorAll(\'.highlightCanvas\')); highlightCanvases.forEach((canvas: any) => { const context = canvas.getContext(\'2d\'); context.clearRect(0, 0, canvas.width, canvas.height); }); } const handleSearch = async () => { clearHeightLightsCanvas() if (!searchText) { jumpToPage(0); return; } const newMatches: any = []; console.log(\'pagesList\', pagesList) // todo 目前是按照每页来匹配,可能会匹配不到跨页的内容 pagesList.forEach(async ({textContent}: any, pageIndex: number) => { const pageMatches = findAllOccurrences(textContent.items, searchText); newMatches.push({ pageIndex, // 页面索引 matchList: pageMatches, // 匹配项列表 }); }) console.log(\'newMatches\', newMatches); const isNotMatch = newMatches.every((match: any) => match.matchList.length === 0); if (isNotMatch) { alert(\'未找到匹配项\'); return; } /// 重新绘制高亮区域 pagesList.forEach((_: any, pageIndex: number) => { const highlightCanvas = pdfContainerRef.current.querySelectorAll(\'.highlightCanvas\')[pageIndex]; // 获取高亮层 Canvas const currentMatches = newMatches.find((match: any) => match.pageIndex === pageIndex); drawHighlights( highlightCanvas, currentMatches?.matchList || [] ); }); // 跳转 const isExistItem = newMatches.find((match: any) => match.matchList.length > 0); if (isExistItem) { scrollToMatch(isExistItem); } }; // 初始化 PDF.js useEffect(() => { loadPdf(pdfUrl); }, []); return  

PDF 搜索(基于@pdfjs-dist-自定义实现)

setSearchText(e.target.value)} placeholder=\"输入要搜索的内容\" />
<div style={{ height: height || \'500px\' }}> {/* PDF 容器 */} <div ref={pdfContainerRef} style={{ position: \'relative\', height: \'100%\', overflowY: \'scroll\' }} />
};export default viewPdf;

【4】获取pdf文件中目录的数据结构

....//基于‘【1】实现pdf预览’的代码 const get= async (url: any) => { const pdf = await pdfjsLib.getDocument(url).promise; // 获取目录数据 const pdfCatalogue= await pdf.getOutline(); console.log(\'目录数据:\', pdfCatalogue); };...