> 技术文档 > 【API文档搜索引擎】上_搜狗搜索api

【API文档搜索引擎】上_搜狗搜索api


API文档索引

  • 1. 为什么要做这个项目
  • 2. 搜索引擎相关宏观原理
  • 3. 搜索引擎技术栈和项目环境
  • 4. 正排索引 vs 倒排索引 - 搜索引擎具体原理
  • 5. 编写数据去标签与数据清洗的模块 Parser
    • 5.1 去标签
    • 5.2 编写parser
  • 6. 编写建立索引的模块 Index
    • 6.1 建立正排索引
    • 6.2 建立倒排索引

1. 为什么要做这个项目

目前我们熟知的搜索引擎有:百度,360,搜狗等,它们所作的都是全网搜索 。而我们接下来做的是站内搜索,搜索数据更垂直,数据量其实更小。以前Boost库是没有搜索的,不过现在是有了,但没关系,重要的是虽然我们做的是Boost搜索引擎,但如果掌握了,可以改成任何一个搜索引擎。

我们可以先看看一个搜索引擎给它关键字给我们显现出来的东西。

可以看到主要呈现的东西有标题、摘要、网址。后面我们实现的搜索引擎也要呈现这样的效果。

【API文档搜索引擎】上_搜狗搜索api

2. 搜索引擎相关宏观原理

在一个服务器集群/服务器里磁盘某个路径下放着从全网爬下来的资源。有个searcher进程再跑。

【API文档搜索引擎】上_搜狗搜索api
第一步:对这个路径下的文件进行去标签+数据清理
第二步:建立索引(方便快速查找)

【API文档搜索引擎】上_搜狗搜索api

当用户发起一个http请求,通过GET方法提交搜索关键字,发起搜索任务

第三步:检索索引得到相关html

拼接多个网页的title+desc+url,构建一个新的网页,返回给用户

【API文档搜索引擎】上_搜狗搜索api

目前我们大致了解一下流程即可。

3. 搜索引擎技术栈和项目环境

  • 技术栈: C/C++ C++11, STL, 准标准库Boost,Jsoncpp,cppjieba,cpp-httplib
  • 选学: html5,css,js、jQuery、Ajax
  • 项目环境:ubentu20.04 云服务器,gcc(g++)/Makefile , vs code

4. 正排索引 vs 倒排索引 - 搜索引擎具体原理

比如说目前我们有两个文档:

  • 文档1:雷军发布了小米汽车
  • 文档2:雷军买了四斤小米

所谓正排索引:就是根据文档ID找到文档内容(文档内的关键字)

文档ID 文档内容 1 雷军发布了小米汽车 2 雷军买了四斤小米

建立倒排索引,之前我们要先对文档进行分词,(目的:方便建立倒排索引和查找)

  • 文档1:[雷军发布了小米汽车]: 雷军/发布/小米/汽车/小米汽车
  • 文档2:[雷军买了四斤小米]: 雷军/买/四斤/小米/四斤小米

停止词:了,的,吗,a,the,⼀般我们在分词的时候可以不考虑。

倒排索引:根据文档内容,分词,整理不重复的各个关键字,对应联系到文档ID的方案

关键字(具有唯一性) 文档ID,weight(权值) 雷军 文档1,文档2 发布 文档1 小米 文档1,文档2 汽车 文档1 小米汽车 文档1 买 文档2 四斤 文档2 四斤小米 文档1,文档2

关于权值也是有必要的,我们搜索一个关键字,就会从上到下给我们很多条搜索结果,那谁在前谁在后呢?这里我们简单一点,就以权值来衡量搜索谁在前谁在后。当前真实情况就是动用钞能力了。。。

模拟一次搜索引擎查找的过程:

用户输入:小米 -> 先倒排索引中查找 -> 提取出文档ID(1,2) -> 在去正排索引中查找 -> 找到对应文档的内容 -> title+conent(desc)+url 文档结果进行摘要 -> 构建响应结果

接下来我们正式编写我们的代码,我们一模块一模块的进行编写。

5. 编写数据去标签与数据清洗的模块 Parser

首先你得要有boost库数据才行。boost库

我们可以下载最新的版本,然后拉到你得linux中,

【API文档搜索引擎】上_搜狗搜索api
【API文档搜索引擎】上_搜狗搜索api

解压压缩包,会得到这个东西

【API文档搜索引擎】上_搜狗搜索api

大部分 .html都在 doc/html目录下,目前只需要boost_1_87_0/doc/html目录下的html文件,用它来进行建立索引

【API文档搜索引擎】上_搜狗搜索api
【API文档搜索引擎】上_搜狗搜索api

这里我把它拷贝到自己项目下的boost_search/data/input路径下

【API文档搜索引擎】上_搜狗搜索api

然后你刚才下载的boost库你就可以删了。

5.1 去标签

html的标签,这个标签对我们进行搜索是没有价值的,需要去掉这些标签,一般标签都是成对出现的!

【API文档搜索引擎】上_搜狗搜索api

我们要去标签,主要是为了得到里面有效内容,它们是我们建立索引需要的东西。

【API文档搜索引擎】上_搜狗搜索api

我们把去标签之后的干净内容放raw.txt文档中

【API文档搜索引擎】上_搜狗搜索api

这里要注意我们把内容写进文档,未来也要建立索引也要进行读取。为了方便读取,文档内的内容以\\3进行分割,文档和文档直接用\\n进行分割

类似:title\\3content\\3url \\n title\\3content\\3url \\n title\\3content\\3url \\n …

这样的话未来我们使用getline(ifsream, line),直接一行一行获取文档的全部内容 title\\3content\\3url

5.2 编写parser

未来公用类的都放在Conmon.h头文件里。

//Common.h#pragma once#include#include#include#includeusing std::cout;using std::cerr;using std::endl;//Parser.cc#inclide\"Common.h\"//源数据所有html文件的路径static const std::string src_path = \"data/input\";//源数据去标签+数据清理之后的路径static const std::string output = \"data/raw_html/raw.txt\";typedef struct DocInfo{ std::string title; //文档的标题 std::string content; //文档的摘要 std::string url; //该文档在官网的url}DocInfo_t;//const &: 输入//*: 输出//&:输入输出bool EnumFile(const std::string &src_path, std::vector<std::string> *files_list);bool ParserHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results);bool SaveHtml(const std::vector<DocInfo_t> &results, const std::string &output);int main(){ std::vector<std::string> files_list; // 第一步: 递归式的把每个html文件名带路径,保存到files_list,方便后期进行一个个的文件读取 if (!EnumFile(src_path, &files_list)) { cerr << \"enum file name fail\" << endl; return 1; } std::vector<DocInfo_t> results; // 第二步: 根据files_list读取每个文件的内容,并进行解析 if (!ParserHtml(files_list, &results)) { cerr << \"parser html fail\" << endl; return 2; } // 第三步: 把解析完的各个文件内容,写入到output,按照\\3作为每个文档内部分割符,\\n作为每个文档的分割符 if (!SaveHtml(results, output)) { cerr << \"save html fail\" << endl; return 3; } return 0;}

如何递归式的把每个html文件名带路径都拿到呢?这里我们可以使用boost库的方法。

通过apt-get命令安装libboost-all-dev包

sudo apt-get updatesudo apt-get install libboost-all-dev

安转好之后,默认安装目录为/usr,
默认头文件安装目录在 /usr/include/boost
默认so库文件安装目录在/usr/lib/x86_64-linux-gnu

安装之后我们就可以写了

bool EnumFile(const std::string &src_path, std::vector<std::string> *files_list){ namespace fs = boost::filesystem; fs::path root_path(src_path); // 判断路径是否存在,如果不存储就没有必要往后走了 if (!fs::exists(root_path)) { //cout << root_path << \"not exists\" << endl; LOGMESSAGE(FATAL,src_path + \"not exists\"); return false; } // 定义一个空的迭代器,用来判断递归结束 fs::recursive_directory_iterator end; for (fs::recursive_directory_iterator start(root_path); start != end; ++start) { // 判断文件是否是普通文件,.html结尾是普通文件 if (!fs::is_regular_file(*start)) { continue; } // 判断文件路径的后缀是否符合要求 if (start->path().extension() != \".html\") { continue; } // cout << \"debug: \" <path().string() << endl; // 当前的路径一定是一个合法的,以.html结束的普通网页文件 // 将所有带路径的html保存在files_list,方便后续进行文本分析 files_list->push_back(start->path().string()); } return true;}

当前已经把所有.html文件名带路径都放到files_list中,然后我们就可以依次读每条路径把每个文档的内容拿出来,进行分割。

//Common.hclass Util{public: //从文件中读取内容 static bool ReadFile(const std::string& file_path, std::string* out) { std::ifstream ifs(file_path); if(!ifs.is_open()) { cerr << \"open file\" << file_path << \"fail\" <<endl; return false; } std::string line; while(getline(ifs,line)) { *out += line; } ifs.close(); return true; }};//Parser.ccbool ParserHtml(const std::vector<std::string> &files_list, std::vector<DocInfo_t> *results){ for (const auto &path : files_list) { // 1. 读取文件 std::string result; if (!Util::ReadFile(path, &result)) { continue; } DocInfo_t doc; // 2.解析指定文件,提取title if (!ParserTitle(result, &doc.title)) { continue; } // 3.解析指定文件,提取content,就是去标签 if (!ParserContent(result, &doc.content)) { continue; } // 4.解析指定文件路径,构建url if (!ParserUrl(path, &doc.url)) { continue; } // done,一定是完成了解析任务,当前文档的相关结果都保存在了doc里面 // results->push_back(std::move(doc))//bug:todo;细节,本质会发生拷贝,效率可能会比较低 results->push_back(std::move(doc)); // 拷贝变移动 // for debug //ShowDoc(doc); //break; } return true;}

提取title

【API文档搜索引擎】上_搜狗搜索api

bool ParserTitle(const std::string &file, std::string *title){ auto begin = file.find(\"\"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>begin <span class="token operator">==</span> std<span class="token double-colon punctuation">::</span>string<span class="token double-colon punctuation">::</span>npos<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">auto</span> end <span class="token operator">=</span> file<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span><span class="token string">\"\"); if (end == std::string::npos) { return false; } begin += std::string(\"\"</span><span class="token punctuation">)</span><span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>begin <span class="token operator">>=</span> end<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token operator">*</span>title <span class="token operator">=</span> file<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>begin<span class="token punctuation">,</span> end <span class="token operator">-</span> begin<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre>
<p><strong>提取content(本质就是在 去标签)</strong></p>
<p>这里我们用个简易的状态机,最开始的状态就是在  标签之后可能下一个就属于content就拿,也有可能是 < 标签不能拿。所以每拿一个字符都要重新判断状态。</p>
<pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">ParserContent</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>file<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">*</span>content<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span> <span class="token comment">// 写个状态机用来区分当前是在标签还是在内容</span> <span class="token keyword">enum</span> <span class="token class-name">status</span> <span class="token punctuation">{<!-- --></span> LABLE<span class="token punctuation">,</span> <span class="token comment">// 标签</span> CONTENT <span class="token comment">// 内容</span> <span class="token punctuation">}</span><span class="token punctuation">;</span> <span class="token comment">// 刚开始指向标签</span> <span class="token keyword">enum</span> <span class="token class-name">status</span> s <span class="token operator">=</span> LABLE<span class="token punctuation">;</span> <span class="token keyword">for</span> <span class="token punctuation">(</span><span class="token keyword">auto</span> c <span class="token operator">:</span> file<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">switch</span> <span class="token punctuation">(</span>s<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">case</span> LABLE<span class="token operator">:</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">\'>\'</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> s <span class="token operator">=</span> CONTENT<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">break</span><span class="token punctuation">;</span> <span class="token keyword">case</span> CONTENT<span class="token operator">:</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">\'<\'</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> s <span class="token operator">=</span> LABLE<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">else</span> <span class="token punctuation">{<!-- --></span> <span class="token comment">// 我们不想保留原始文件中的\\n,因为我们想用\\n作为html解析之后文本的分隔符</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>c <span class="token operator">==</span> <span class="token char">\'\\n\'</span><span class="token punctuation">)</span>  c <span class="token operator">=</span> <span class="token char">\' \'</span><span class="token punctuation">;</span> <span class="token operator">*</span>content <span class="token operator">+=</span> c<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">break</span><span class="token punctuation">;</span> <span class="token keyword">default</span><span class="token operator">:</span> <span class="token keyword">break</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre>
<p><strong>提取url</strong></p>
<p>boost库的官方文档,和我们下载下来的文档,是有路径的对应关系的。</p>
<p>官网:</p>
<pre><code class="prism language-cpp">https<span class="token operator">:</span><span class="token comment">//www.boost.org/doc/libs/1_87_0/doc/html/accumulators.html</span></code></pre>
<p>我们下载下来的url样例:</p>
<pre><code class="prism language-cpp">boost_1_78_0<span class="token operator">/</span>doc<span class="token operator">/</span>html<span class="token operator">/</span>accumulators<span class="token punctuation">.</span>html</code></pre>
<p>我们拷贝到我们项目中的样例:</p>
<pre><code class="prism language-cpp">data<span class="token operator">/</span>input<span class="token operator">/</span>accumulators<span class="token punctuation">.</span>html</code></pre>
<p>想把我们的路径变成官网url,我们就拼接一下,相当于形成了一个官网链接</p>
<pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">ParserUrl</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>file_path<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">*</span>url<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span> std<span class="token double-colon punctuation">::</span>string url_head <span class="token operator">=</span> <span class="token string">\"https://www.boost.org/doc/libs/1_87_0/doc/html\"</span><span class="token punctuation">;</span> std<span class="token double-colon punctuation">::</span>string url_tail <span class="token operator">=</span> file_path<span class="token punctuation">.</span><span class="token function">substr</span><span class="token punctuation">(</span>src_path<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token operator">*</span>url <span class="token operator">=</span> url_head <span class="token operator">+</span> url_tail<span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre>
<p><strong>将解析内容写入文件中</strong></p>
<pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">SaveHtml</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo_t<span class="token operator">></span> <span class="token operator">&</span>results<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>output<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">define</span> <span class="token macro-name">SEP</span> <span class="token char">\'\\3\'</span></span> <span class="token comment">//按照二进制方式进行写入</span> std<span class="token double-colon punctuation">::</span>ofstream <span class="token function">ofs</span><span class="token punctuation">(</span>output<span class="token punctuation">,</span>std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>out <span class="token operator">|</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>binary<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">if</span><span class="token punctuation">(</span><span class="token operator">!</span>ofs<span class="token punctuation">.</span><span class="token function">is_open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> cerr <span class="token operator"><<</span> <span class="token string">\"open\"</span> <span class="token operator"><<</span> output <span class="token operator"><<</span> <span class="token string">\"fail\"</span> <span class="token operator"><<</span>endl<span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">//就可以进行文件内容的写入了</span> <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> doc <span class="token operator">:</span> results<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> std<span class="token double-colon punctuation">::</span>string out_string<span class="token punctuation">;</span> out_string <span class="token operator">=</span> doc<span class="token punctuation">.</span>title<span class="token punctuation">;</span> out_string <span class="token operator">+=</span> SEP<span class="token punctuation">;</span> out_string <span class="token operator">+=</span> doc<span class="token punctuation">.</span>content<span class="token punctuation">;</span> out_string <span class="token operator">+=</span> SEP<span class="token punctuation">;</span> out_string <span class="token operator">+=</span> doc<span class="token punctuation">.</span>url<span class="token punctuation">;</span> out_string <span class="token operator">+=</span> <span class="token char">\'\\n\'</span><span class="token punctuation">;</span> ofs<span class="token punctuation">.</span><span class="token function">write</span><span class="token punctuation">(</span>out_string<span class="token punctuation">.</span><span class="token function">c_str</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">,</span>out_string<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> ofs<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre>
<h2>6. 编写建立索引的模块 Index</h2>
<p>目前已经把处理好的所有文档内容都放在raw.txt中了,下面我们可以从这里拿内容建立正排索引和倒排索引了。</p>
<pre><code class="prism language-cpp"><span class="token comment">//Index.hpp</span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">\"Common.h\"</span></span><span class="token comment">// 正排索引文档Id对应文档的内容</span><span class="token keyword">typedef</span> <span class="token keyword">struct</span> <span class="token class-name">DocInfo</span><span class="token punctuation">{<!-- --></span> std<span class="token double-colon punctuation">::</span>string title<span class="token punctuation">;</span> <span class="token comment">// 文档的标题</span> std<span class="token double-colon punctuation">::</span>string content<span class="token punctuation">;</span> <span class="token comment">// 文档的内容</span> std<span class="token double-colon punctuation">::</span>string url<span class="token punctuation">;</span> <span class="token comment">// 该文档在官网的url</span> <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span> <span class="token comment">// 文档ID,建立倒排排序需要用到</span><span class="token punctuation">}</span> DocInfo<span class="token punctuation">;</span><span class="token comment">// 倒排索引分词对应的文档ID和权重</span><span class="token keyword">typedef</span> <span class="token keyword">struct</span> <span class="token class-name">InvertedElem</span><span class="token punctuation">{<!-- --></span> <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span><span class="token comment">//文档id</span> std<span class="token double-colon punctuation">::</span>string word<span class="token punctuation">;</span><span class="token comment">//关键字,等会对文档内容做摘要要用</span> <span class="token keyword">int</span> weight<span class="token punctuation">;</span><span class="token comment">//权重</span> <span class="token function">InvertedElem</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">:</span><span class="token function">doc_id</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span><span class="token punctuation">,</span> <span class="token function">weight</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span><span class="token punctuation">}</span> InvertedElem<span class="token punctuation">;</span><span class="token comment">// 倒排拉链</span><span class="token keyword">typedef</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>InvertedElem<span class="token operator">></span> InvertedList<span class="token punctuation">;</span><span class="token keyword">class</span> <span class="token class-name">Index</span><span class="token punctuation">{<!-- --></span><span class="token keyword">private</span><span class="token operator">:</span> <span class="token comment">// 正排索引的数据结构使用数组,数组的下标天然就是文档ID</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo<span class="token operator">></span> forward_index<span class="token punctuation">;</span> <span class="token comment">// 正排索引</span> <span class="token comment">// 倒排索引一定是一个关键字和一个或者多个InvertedElem对应关系(关键字和倒排拉链的映射关系)</span> std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> InvertedList<span class="token operator">></span> inverted_index<span class="token punctuation">;</span><span class="token keyword">public</span><span class="token operator">:</span> <span class="token comment">// 根据doc_id找到文档内容</span> DocInfo <span class="token operator">*</span><span class="token function">GetForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">// 根据关键字stirng,获得倒排拉链</span> InvertedList <span class="token operator">*</span><span class="token function">GetInvertedIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>word<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">// 根据data/raw_html/raw_txt去标签,得到的文档内容,构建正排索引和倒排索引</span> <span class="token keyword">bool</span> <span class="token function">BuildIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> output<span class="token punctuation">)</span> <span class="token comment">// parse处理完毕的数据交给我</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token punctuation">;</span></code></pre>
<p>先把根据文档ID得到文档内容,和根据关键字得到倒排拉链,以及读一行建立正排索引和倒排索引简单函数写一下。</p>
<pre><code class="prism language-cpp"><span class="token comment">// 根据doc_id找到文档内容</span>DocInfo <span class="token operator">*</span><span class="token function">GetForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span> <span class="token keyword">if</span> <span class="token punctuation">(</span>doc_id <span class="token operator">>=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> cerr <span class="token operator"><<</span> <span class="token string">\"doc_id out range,fail\"</span> <span class="token operator"><<</span> endl<span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">[</span>doc_id<span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">// 根据关键字stirng,获得倒排拉链</span>InvertedList <span class="token operator">*</span><span class="token function">GetInvertedIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>word<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span> <span class="token keyword">auto</span> it <span class="token operator">=</span> inverted_index<span class="token punctuation">.</span><span class="token function">find</span><span class="token punctuation">(</span>word<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">if</span> <span class="token punctuation">(</span>it <span class="token operator">==</span> inverted_index<span class="token punctuation">.</span><span class="token function">end</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> cerr <span class="token operator"><<</span> word <span class="token operator"><<</span> <span class="token string">\"have no InvertedList\"</span> <span class="token operator"><<</span> endl<span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> <span class="token operator">&</span>inverted_index<span class="token punctuation">[</span>word<span class="token punctuation">]</span><span class="token punctuation">;</span><span class="token punctuation">}</span><span class="token comment">// 根据data/raw_html/raw_txt去标签,得到的文档内容,构建正排索引和倒排索引</span><span class="token keyword">bool</span> <span class="token function">BuildIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> output<span class="token punctuation">)</span> <span class="token comment">// parse处理完毕的数据交给我</span><span class="token punctuation">{<!-- --></span> std<span class="token double-colon punctuation">::</span>ifstream <span class="token function">ifs</span><span class="token punctuation">(</span>output<span class="token punctuation">,</span>std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>in <span class="token operator">|</span> std<span class="token double-colon punctuation">::</span>ios<span class="token double-colon punctuation">::</span>binary<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">if</span><span class="token punctuation">(</span><span class="token operator">!</span>ifs<span class="token punctuation">.</span><span class="token function">is_open</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token comment">//cerr << \"sorry, \" << output << \" open error\" << endl;</span> <span class="token function">LOGMESSAGE</span><span class="token punctuation">(</span>FATAL<span class="token punctuation">,</span><span class="token string">\"sorry\"</span> <span class="token operator">+</span> output <span class="token operator">+</span> <span class="token string">\"open error\"</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token boolean">false</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> std<span class="token double-colon punctuation">::</span>string line<span class="token punctuation">;</span> <span class="token keyword">while</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">getline</span><span class="token punctuation">(</span>ifs<span class="token punctuation">,</span>line<span class="token punctuation">)</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token comment">//建立正排索引</span> DocInfo<span class="token operator">*</span> doc <span class="token operator">=</span> <span class="token function">BuildForwardIndex</span><span class="token punctuation">(</span>line<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">if</span><span class="token punctuation">(</span>doc <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> cerr <span class="token operator"><<</span> <span class="token string">\"build\"</span> <span class="token operator"><<</span> line <span class="token operator"><<</span><span class="token string">\"fail\"</span><span class="token operator"><<</span>endl<span class="token punctuation">;</span> <span class="token keyword">continue</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><span class="token comment">//建立倒排索引</span> <span class="token function">BuildInvertedIndex</span><span class="token punctuation">(</span><span class="token operator">*</span>doc<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> ifs<span class="token punctuation">.</span><span class="token function">close</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre>
<h3>6.1 建立正排索引</h3>
<p>每个文档内容都用 \\3,用以区分一个文档内的title,desc,url,虽然可以用string自带的find找到每个起始位置,然后使用substr根据起始位置以及长度得到title,desc,url。但是这我们使用boost中的split函数比较方便。</p>
<p><img src="https://i-blog.csdnimg.cn/direct/8013d89704dd438ca10a9d5430f6da9b.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /><br /> 第一个参数表示,切分后结果放哪里<br /> 第二个参数表示,要切分的内容<br /> 第三个参数表示,按什么切分<br /> 第四个参数表示,是否压缩</p>
<p>是否压缩意思是有一个字符串,aaa/3bbb/3/3/3/3ccc<br /> 如果压缩 aaa/3bbb/3/3/3/3ccc 切分后结果是 aaa bbb ccc<br /> 如果不压缩 aaa/3bbb/3/3/3/3ccc 切分后结果是 aaa bbb ccc,/3和/3之间表示一个空格,压缩就是把/3/3/3/3当从一个/3。</p>
<p>默认是把压缩关闭的。token_compress_on 打开压缩。</p>
<p><img src="https://i-blog.csdnimg.cn/direct/2d9ad360f44749fd9672fd2afd38ef1f.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /></p>
<pre><code class="prism language-cpp"><span class="token comment">//Common.h</span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string"></span></span><span class="token comment">//使用boost库的函数,对字符串分割</span><span class="token keyword">class</span> <span class="token class-name">StringUtil</span><span class="token punctuation">{<!-- --></span><span class="token keyword">public</span><span class="token operator">:</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">Spilt</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>target<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> <span class="token operator">*</span>out<span class="token punctuation">,</span> <span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>sep<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> boost<span class="token double-colon punctuation">::</span><span class="token function">split</span><span class="token punctuation">(</span><span class="token operator">*</span>out<span class="token punctuation">,</span>target<span class="token punctuation">,</span>boost<span class="token double-colon punctuation">::</span><span class="token function">is_any_of</span><span class="token punctuation">(</span>sep<span class="token punctuation">)</span><span class="token punctuation">,</span>boost<span class="token double-colon punctuation">::</span>token_compress_on<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token comment">//Index.hpp</span>DocInfo<span class="token operator">*</span> <span class="token function">BuildForwardIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string<span class="token operator">&</span> line<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token comment">// 1.对line进行分割</span> <span class="token comment">// line -> title content url</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> results<span class="token punctuation">;</span><span class="token comment">//一行分割的结果</span> std<span class="token double-colon punctuation">::</span>string sep <span class="token operator">=</span> <span class="token string">\"\\3\"</span><span class="token punctuation">;</span> <span class="token comment">// 行内分割符</span> <span class="token class-name">StringUtil</span><span class="token double-colon punctuation">::</span><span class="token function">Spilt</span><span class="token punctuation">(</span>line<span class="token punctuation">,</span><span class="token operator">&</span>results<span class="token punctuation">,</span>sep<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">// 2.符串进行填充到DocIinfo</span> DocInfo doc<span class="token punctuation">;</span> doc<span class="token punctuation">.</span>title <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">;</span> doc<span class="token punctuation">.</span>content <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">;</span> doc<span class="token punctuation">.</span>url <span class="token operator">=</span> results<span class="token punctuation">[</span><span class="token number">2</span><span class="token punctuation">]</span><span class="token punctuation">;</span> doc<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> forward_index<span class="token punctuation">.</span><span class="token function">size</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token comment">//先进行保存id,在插入,id就是当前doc在vector中的下标!</span> <span class="token comment">//3. 插入到正排索引的vector</span> forward_index<span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>doc<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">return</span> <span class="token operator">&</span>forward_index<span class="token punctuation">.</span><span class="token function">back</span><span class="token punctuation">(</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span></code></pre>
<h3>6.2 建立倒排索引</h3>
<p>每次建立一个正排索引之后,我们都把结果给拿到,然后去建立倒排索引。</p>
<pre><code class="prism language-cpp"><span class="token comment">//我们从正排索引拿到的⽂档内容</span><span class="token keyword">struct</span> <span class="token class-name">DocInfo</span><span class="token punctuation">{<!-- --></span>std<span class="token double-colon punctuation">::</span>string title<span class="token punctuation">;</span> <span class="token comment">//⽂档的标题</span>std<span class="token double-colon punctuation">::</span>string content<span class="token punctuation">;</span> <span class="token comment">//⽂档对应的去标签之后的内容</span>std<span class="token double-colon punctuation">::</span>string url<span class="token punctuation">;</span> <span class="token comment">//官⽹⽂档url</span><span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span> <span class="token comment">//⽂档的ID,暂时先不做过多理解</span><span class="token punctuation">}</span><span class="token punctuation">;</span><span class="token comment">//假设⽂档内容:</span>title : 吃葡萄content<span class="token operator">:</span> 吃葡萄不吐葡萄⽪url<span class="token operator">:</span> http<span class="token operator">:</span><span class="token comment">//XXXX</span>doc_id<span class="token operator">:</span> <span class="token number">123</span></code></pre>
<p>根据文档内容,形成一个或者多个InvertedElem(倒排拉链)<br /> 因为当前我们是一个一个文档进行处理的,一个文档会包含多个”词“, 都应当对应到当前的doc_id</p>
<ol>
<li>需要对 title && content都要先分词</li>
</ol>
<p>title: 吃/葡萄/吃葡萄(title_word)<br /> content:吃/葡萄/不吐/葡萄皮(content_word)</p>
<p>词和文档的相关性(词频:在标题中出现的词,可以认为相关性更高一些,在内容中出现相关性低一些)</p>
<ol start="2">
<li>词频统计</li>
</ol>
<pre><code class="prism language-cpp"><span class="token keyword">struct</span> <span class="token class-name">word_cnt</span><span class="token punctuation">{<!-- --></span>title_cnt<span class="token punctuation">;</span>content_cnt<span class="token punctuation">;</span><span class="token punctuation">}</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> word_cnt<span class="token operator">></span> word_cnt<span class="token punctuation">;</span><span class="token keyword">for</span> <span class="token operator">&</span>word <span class="token operator">:</span> title_word<span class="token punctuation">{<!-- --></span>word_cnt<span class="token punctuation">[</span>word<span class="token punctuation">]</span><span class="token punctuation">.</span>title_cnt<span class="token operator">++</span><span class="token punctuation">;</span> <span class="token comment">//吃(1)/葡萄(1)/吃葡萄(1)</span><span class="token punctuation">}</span><span class="token keyword">for</span> <span class="token operator">&</span>word <span class="token operator">:</span> content_word <span class="token punctuation">{<!-- --></span>word_cnt<span class="token punctuation">[</span>word<span class="token punctuation">]</span><span class="token punctuation">.</span>content_cnt<span class="token operator">++</span><span class="token punctuation">;</span> <span class="token comment">//吃(1)/葡萄(1)/不吐(1)/葡萄⽪(1)</span><span class="token punctuation">}</span></code></pre>
<p>知道了在文档中,标题和内容每个词出现的次数</p>
<ol start="3">
<li>自定义相关性</li>
</ol>
<pre><code class="prism language-cpp"><span class="token comment">//把关键字从统计每个关键字和文档相关系的word_cnt拿出建立对应的倒排拉链</span><span class="token keyword">for</span> <span class="token operator">&</span>word <span class="token operator">:</span> word_cnt<span class="token punctuation">{<!-- --></span><span class="token comment">//具体一个词和123⽂档的对应关系,当有多个不同的词,指向同⼀个⽂档的时候,此时该优先显⽰谁??相关性!</span><span class="token keyword">struct</span> <span class="token class-name">InvertedElem</span> elem<span class="token punctuation">;</span>elem<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> <span class="token number">123</span><span class="token punctuation">;</span>elem<span class="token punctuation">.</span>word <span class="token operator">=</span> word<span class="token punctuation">.</span>first<span class="token punctuation">;</span>elem<span class="token punctuation">.</span>weight <span class="token operator">=</span> <span class="token number">10</span><span class="token operator">*</span>word<span class="token punctuation">.</span>second<span class="token punctuation">.</span>title_cnt <span class="token operator">+</span> word<span class="token punctuation">.</span>second<span class="token punctuation">.</span>content_cnt <span class="token punctuation">;</span> <span class="token comment">//相关性,我们这⾥拍着脑⻔写了</span><span class="token comment">//插入关键字对应的倒排拉链</span>inverted_index<span class="token punctuation">[</span>word<span class="token punctuation">.</span>first<span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>elem<span class="token punctuation">)</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre>
<p>关于如何分词,这里我们使用<strong>cppjieba</strong>分词。</p>
<p>可以在gitee中搜索cppjieba,随便选一个点进去</p>
<p><img src="https://i-blog.csdnimg.cn/direct/c3ca1485127d4fe8bd06bc2b69b5392f.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /></p>
<p>然后在linux中使用git clone拉到本地</p>
<pre><code class="prism language-powershell">git clone https:<span class="token operator">/</span><span class="token operator">/</span>gitee<span class="token punctuation">.</span>com/lycium_pkg_mirror/cppjieba<span class="token punctuation">.</span>git</code></pre>
<p><img src="https://i-blog.csdnimg.cn/direct/9d7c52a1972b4249960c76fe0e1dbbe2.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /></p>
<p>不过cppjieba还有一个地方需要注意,还需要把limonp也要克隆到本地,</p>
<pre><code class="prism language-powershell">git clone https:<span class="token operator">/</span><span class="token operator">/</span>github<span class="token punctuation">.</span>com/yanyiwu/limonp<span class="token punctuation">.</span>git</code></pre>
<p><img src="https://i-blog.csdnimg.cn/direct/e175496758104d579b571abe379338f6.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /></p>
<p>如果下载目前对应路径下应该有这两个目录,</p>
<p><img src="https://i-blog.csdnimg.cn/direct/d20379ffbcfe483ea49b6dec12dc4861.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /></p>
<p>下面我们要把limonp拷贝到cppjieba/include/cppjieba下,不然使用就会有问题,找不到对应文件。</p>
<p><img src="https://i-blog.csdnimg.cn/direct/f1b5dc7ee8b946749699a6d07d66d44c.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /></p>
<p>最后在你自己项目下使用软链接的方式去链接一下头文件所在路径,不然编译会找不到头文件在哪里。或者你使用绝对路径比如#include “/home/wdl/thridpart/cppjieba/include/cppjieba”</p>
<p><img src="https://i-blog.csdnimg.cn/direct/1cc6e4a4deac43c8b30a53b09a20c9f6.png" alt="【API文档搜索引擎】上_搜狗搜索api" alt="在这里插入图片描述" /></p>
<pre><code class="prism language-cpp"><span class="token comment">//Common.h</span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span><span class="token string">\"cppjieba/Jieba.hpp\"</span></span><span class="token keyword">const</span> <span class="token keyword">char</span><span class="token operator">*</span> <span class="token keyword">const</span> DICT_PATH <span class="token operator">=</span> <span class="token string">\"./dict/jieba.dict.utf8\"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span><span class="token operator">*</span> <span class="token keyword">const</span> HMM_PATH <span class="token operator">=</span> <span class="token string">\"./dict/hmm_model.utf8\"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span><span class="token operator">*</span> <span class="token keyword">const</span> USER_DICT_PATH <span class="token operator">=</span> <span class="token string">\"./dict/user.dict.utf8\"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span><span class="token operator">*</span> <span class="token keyword">const</span> IDF_PATH <span class="token operator">=</span> <span class="token string">\"./dict/idf.utf8\"</span><span class="token punctuation">;</span><span class="token keyword">const</span> <span class="token keyword">char</span><span class="token operator">*</span> <span class="token keyword">const</span> STOP_WORD_PATH <span class="token operator">=</span> <span class="token string">\"./dict/stop_words.utf8\"</span><span class="token punctuation">;</span><span class="token keyword">class</span> <span class="token class-name">JiebaUtil</span><span class="token punctuation">{<!-- --></span><span class="token keyword">private</span><span class="token operator">:</span> <span class="token keyword">static</span> cppjieba<span class="token double-colon punctuation">::</span>Jieba jieba<span class="token punctuation">;</span><span class="token keyword">public</span><span class="token operator">:</span> <span class="token keyword">static</span> <span class="token keyword">void</span> <span class="token function">CutString</span><span class="token punctuation">(</span><span class="token keyword">const</span> std<span class="token double-colon punctuation">::</span>string <span class="token operator">&</span>src<span class="token punctuation">,</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> <span class="token operator">*</span>out<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> jieba<span class="token punctuation">.</span><span class="token function">CutForSearch</span><span class="token punctuation">(</span>src<span class="token punctuation">,</span> <span class="token operator">*</span>out<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><span class="token punctuation">}</span><span class="token punctuation">;</span>cppjieba<span class="token double-colon punctuation">::</span>Jieba <span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">jieba</span><span class="token punctuation">(</span>DICT_PATH<span class="token punctuation">,</span> HMM_PATH<span class="token punctuation">,</span> USER_DICT_PATH<span class="token punctuation">,</span> IDF_PATH<span class="token punctuation">,</span> STOP_WORD_PATH<span class="token punctuation">)</span><span class="token punctuation">;</span></code></pre>
<p>建立倒排索引,这里还有一点比如你搜索HELLO,hello,Hello,都会有hello的内容,这是因为搜索时忽略大小写了,因此我们在存关键字的时候也要忽略大小写,所以使用 boost::to_lower()把关键字变成小写在存</p>
<pre><code class="prism language-cpp"><span class="token keyword">bool</span> <span class="token function">BuildInvertedIndex</span><span class="token punctuation">(</span><span class="token keyword">const</span> DocInfo<span class="token operator">&</span> doc<span class="token punctuation">)</span><span class="token punctuation">{<!-- --></span> <span class="token comment">//DocInfo{title, content, url, doc_id}</span> <span class="token comment">//每个关键字在title和content出现的频次</span> <span class="token keyword">struct</span> <span class="token class-name">word_cnt</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">int</span> title_cnt <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token keyword">int</span> content_cnt <span class="token operator">=</span> <span class="token number">0</span><span class="token punctuation">;</span> <span class="token punctuation">}</span><span class="token punctuation">;</span> <span class="token comment">//保存关键字和词频的映射关系</span> std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span>word_cnt<span class="token operator">></span> word_map<span class="token punctuation">;</span> <span class="token comment">//对标题进行分词</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> title_words<span class="token punctuation">;</span> <span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">CutString</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>title<span class="token punctuation">,</span><span class="token operator">&</span>title_words<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//对标题进行词频统计</span> <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span> s <span class="token operator">:</span> title_words<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token comment">//搜索的时忽略大小写,所以在倒排中的关键字也需要忽略大小写</span> boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//需要统一转化成为小写</span> word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>title_cnt<span class="token operator">++</span><span class="token punctuation">;</span><span class="token comment">//如果存在就获取,如果不存在就新建</span> <span class="token punctuation">}</span> <span class="token comment">//对内容进行分词</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token operator">></span> content_words<span class="token punctuation">;</span> <span class="token class-name">JiebaUtil</span><span class="token double-colon punctuation">::</span><span class="token function">CutString</span><span class="token punctuation">(</span>doc<span class="token punctuation">.</span>content<span class="token punctuation">,</span><span class="token operator">&</span>content_words<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//对内容进行词频统计</span> <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span> s <span class="token operator">:</span> content_words<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> boost<span class="token double-colon punctuation">::</span><span class="token function">to_lower</span><span class="token punctuation">(</span>s<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token comment">//需要统一转化成为小写</span> word_map<span class="token punctuation">[</span>s<span class="token punctuation">]</span><span class="token punctuation">.</span>content_cnt<span class="token operator">++</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">//把在word_map的关键字放到倒排中</span> <span class="token keyword">for</span><span class="token punctuation">(</span><span class="token keyword">auto</span><span class="token operator">&</span> word_pair <span class="token operator">:</span> word_map<span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> InvertedElem elem<span class="token punctuation">;</span> elem<span class="token punctuation">.</span>doc_id <span class="token operator">=</span> doc<span class="token punctuation">.</span>doc_id<span class="token punctuation">;</span> elem<span class="token punctuation">.</span>word <span class="token operator">=</span> word_pair<span class="token punctuation">.</span>first<span class="token punctuation">;</span> elem<span class="token punctuation">.</span>weight <span class="token operator">=</span> <span class="token number">10</span><span class="token operator">*</span>word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>title_cnt <span class="token operator">+</span> word_pair<span class="token punctuation">.</span>second<span class="token punctuation">.</span>content_cnt<span class="token punctuation">;</span> inverted_index<span class="token punctuation">[</span>word_pair<span class="token punctuation">.</span>first<span class="token punctuation">]</span><span class="token punctuation">.</span><span class="token function">push_back</span><span class="token punctuation">(</span>std<span class="token double-colon punctuation">::</span><span class="token function">move</span><span class="token punctuation">(</span>elem<span class="token punctuation">)</span><span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> <span class="token boolean">true</span><span class="token punctuation">;</span><span class="token punctuation">}</span></code></pre>
<p>Index这个类整个项目有一份就够了,因此我们可以把它写成单例模式</p>
<pre><code class="prism language-cpp"><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">pragma</span> <span class="token expression">once</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string">\"Common.h\"</span></span><span class="token macro property"><span class="token directive-hash">#</span><span class="token directive keyword">include</span> <span class="token string"></span></span><span class="token comment">// 正排索引文档Id对应文档的内容</span><span class="token keyword">typedef</span> <span class="token keyword">struct</span> <span class="token class-name">DocInfo</span><span class="token punctuation">{<!-- --></span> std<span class="token double-colon punctuation">::</span>string title<span class="token punctuation">;</span> <span class="token comment">// 文档的标题</span> std<span class="token double-colon punctuation">::</span>string content<span class="token punctuation">;</span> <span class="token comment">// 文档的内容</span> std<span class="token double-colon punctuation">::</span>string url<span class="token punctuation">;</span> <span class="token comment">// 该文档在官网的url</span> <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span> <span class="token comment">// 文档ID,建立倒排排序需要用到</span><span class="token punctuation">}</span> DocInfo<span class="token punctuation">;</span><span class="token comment">// 倒排索引分词对应的文档ID和权重</span><span class="token keyword">typedef</span> <span class="token keyword">struct</span> <span class="token class-name">InvertedElem</span><span class="token punctuation">{<!-- --></span> <span class="token keyword">uint64_t</span> doc_id<span class="token punctuation">;</span> std<span class="token double-colon punctuation">::</span>string word<span class="token punctuation">;</span><span class="token comment">//关键字,等会对文档内容做摘要要用</span> <span class="token keyword">int</span> weight<span class="token punctuation">;</span> <span class="token function">InvertedElem</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token operator">:</span> <span class="token function">weight</span><span class="token punctuation">(</span><span class="token number">0</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span><span class="token punctuation">}</span> InvertedElem<span class="token punctuation">;</span><span class="token comment">// 倒排拉链</span><span class="token keyword">typedef</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>InvertedElem<span class="token operator">></span> InvertedList<span class="token punctuation">;</span><span class="token keyword">class</span> <span class="token class-name">Index</span><span class="token punctuation">{<!-- --></span><span class="token keyword">private</span><span class="token operator">:</span> <span class="token comment">// 正排索引的数据结构使用数组,数组的下标天然就是文档ID</span> std<span class="token double-colon punctuation">::</span>vector<span class="token operator"><</span>DocInfo<span class="token operator">></span> forward_index<span class="token punctuation">;</span> <span class="token comment">// 正排索引</span> <span class="token comment">// 倒排索引一定是一个关键字和一个或者多个InvertedElem对应关系(关键字和倒排拉链的映射关系)</span> std<span class="token double-colon punctuation">::</span>unordered_map<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>string<span class="token punctuation">,</span> InvertedList<span class="token operator">></span> inverted_index<span class="token punctuation">;</span><span class="token keyword">private</span><span class="token operator">:</span> <span class="token comment">//懒汉模式</span> <span class="token keyword">static</span> Index<span class="token operator">*</span> _Sint<span class="token punctuation">;</span> <span class="token keyword">static</span> std<span class="token double-colon punctuation">::</span>mutex _mtx<span class="token punctuation">;</span> <span class="token function">Index</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span><span class="token punctuation">}</span> <span class="token function">Index</span><span class="token punctuation">(</span><span class="token keyword">const</span> Index<span class="token operator">&</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token keyword">delete</span><span class="token punctuation">;</span> Index<span class="token operator">&</span> <span class="token keyword">operator</span><span class="token operator">=</span><span class="token punctuation">(</span><span class="token keyword">const</span> Index<span class="token operator">&</span><span class="token punctuation">)</span> <span class="token operator">=</span> <span class="token keyword">delete</span><span class="token punctuation">;</span><span class="token keyword">public</span><span class="token operator">:</span> <span class="token keyword">static</span> Index<span class="token operator">*</span> <span class="token function">GetInstance</span><span class="token punctuation">(</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> <span class="token keyword">if</span><span class="token punctuation">(</span>_Sint <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> std<span class="token double-colon punctuation">::</span>unique_lock<span class="token operator"><</span>std<span class="token double-colon punctuation">::</span>mutex<span class="token operator">></span> <span class="token function">lock</span><span class="token punctuation">(</span>_mtx<span class="token punctuation">)</span><span class="token punctuation">;</span> <span class="token keyword">if</span><span class="token punctuation">(</span>_Sint <span class="token operator">==</span> <span class="token keyword">nullptr</span><span class="token punctuation">)</span> <span class="token punctuation">{<!-- --></span> _Sint <span class="token operator">=</span> <span class="token keyword">new</span> Index<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token punctuation">}</span> <span class="token keyword">return</span> _Sint<span class="token punctuation">;</span> <span class="token punctuation">}</span> <span class="token comment">//...</span><span class="token punctuation">}</span><span class="token punctuation">;</span>Index<span class="token operator">*</span> Index<span class="token double-colon punctuation">::</span>_Sint <span class="token operator">=</span> <span class="token keyword">nullptr</span><span class="token punctuation">;</span>std<span class="token double-colon punctuation">::</span>mutex Index<span class="token double-colon punctuation">::</span>_mtx<span class="token punctuation">;</span></code></pre>
</div>
				
				               	<div class="clear"></div>
                			

				                <div class="article_tags">
                	<div class="tagcloud">
                    	网络标签:<a href="http://www.csdndoc.com/tag/zzl" rel="tag">在这里</a> <a href="http://www.csdndoc.com/tag/wd" rel="tag">文档</a> <a href="http://www.csdndoc.com/tag/sy" rel="tag">索引</a>                    </div>
                </div>
				
             </div>
		</div>
    

			
    
		<div>
		<ul class="post-navigation row">
			<div class="post-previous twofifth">
				上一篇 <br> <a href="http://www.csdndoc.com/thread/8077.html" rel="prev">【Elasticsearch】集群配置性能优化_elasticsearch 传输协议</a>            </div>
            <div class="post-next twofifth">
				下一篇 <br> <a href="http://www.csdndoc.com/thread/8079.html" rel="next">【Elasticsearch】监控与管理:集群监控指标</a>            </div>
        </ul>
	</div>
	     
	<div class="article_container row  box article_related">
    	<div class="related">
		<div class="newrelated">
    <h2>相关问题</h2>
    <ul>
                        <li><a href="http://www.pcgg.com.cn/xyx/34457.html">4399有个策略末日游戏</a></li>
                            <li><a href="http://www.pcgg.com.cn/rxcq/52783.html">热血传奇稀奇装备</a></li>
                            <li><a href="http://www.pcgg.com.cn/cj/39264.html">白石山冬天开放吗</a></li>
                            <li><a href="http://www.pcgg.com.cn/aedfh/36489.html">艾尔登法环 进度丢失</a></li>
                            <li><a href="http://www.pcgg.com.cn/ys/46481.html">原神怎么宣传</a></li>
                            <li><a href="http://www.pcgg.com.cn/ys/44345.html">原神飞行怪怎么打</a></li>
                            <li><a href="http://www.pcgg.com.cn/sygl/60008.html">斗技王攻略</a></li>
                            <li><a href="http://www.pcgg.com.cn/gl/4997.html">原神龙脊雪山绯红玉髓位置在哪</a></li>
                            <li><a href="http://www.pcgg.com.cn/bwlb/48032.html">保卫萝卜326关冒险攻略</a></li>
                            <li><a href="http://www.pcgg.com.cn/lol/21950.html">lol怎么快速升级到500</a></li>
                </ul>
</div>
       	</div>
	</div>
         	<div class="clear"></div>
	<div id="comments_box">

    </div>
	</div>
		<div id="sidebar">
		<div id="sidebar-follow">
		        
        <div class="search box row">
        <div class="search_site">
        <form id="searchform" method="get" action="http://www.csdndoc.com/index.php">
            <button type="submit" value="" id="searchsubmit" class="button"><i class="fasearch">☚</i></button>
            <label><input type="text" class="search-s" name="s" x-webkit-speech="" placeholder="请输入搜索内容"></label>
        </form></div></div>
        <div class="widget_text widget box row widget_custom_html"><h3>公告</h3><div class="textwidget custom-html-widget"><a target="_blank" href="http://www.5d.ink/deepseek/?d=DeepseekR1_local.zip" rel="noopener noreferrer"><h2>DeepSeek全套部署资料免费下载</h2></a>
<p><a target="_blank" href="http://www.5d.ink/deepseek/?d=DeepseekR1_local.zip" rel="noopener noreferrer"><img src="http://css.5d.ink/img/deep.png" alt="DeepSeekR1本地部署部署资料免费下载"></a></p><br /><br />
<a target="_blank" href="http://www.5d.ink/freefonts/?d=FreeFontsdown.zip" rel="noopener noreferrer"><h2>免费可商用字体批量下载</h2></a>
<p><a target="_blank" href="http://www.5d.ink/freefonts/?d=FreeFontsdown.zip" rel="noopener noreferrer"><img src="http://css.5d.ink/img/freefont.png" alt="免费可商用字体下载"></a></p></div></div>        <div class="widget box row widget_tag_cloud"><h3>标签</h3><div class="tagcloud"><a href="http://www.csdndoc.com/tag/ck-2" class="tag-cloud-link tag-link-237 tag-link-position-1" style="font-size: 8.5773195876289pt;" aria-label="仓库 (345个项目)">仓库</a>
<a href="http://www.csdndoc.com/tag/dm" class="tag-cloud-link tag-link-47 tag-link-position-2" style="font-size: 16.515463917526pt;" aria-label="代码 (1,216个项目)">代码</a>
<a href="http://www.csdndoc.com/tag/ys" class="tag-cloud-link tag-link-62 tag-link-position-3" style="font-size: 10.164948453608pt;" aria-label="元素 (447个项目)">元素</a>
<a href="http://www.csdndoc.com/tag/hs" class="tag-cloud-link tag-link-38 tag-link-position-4" style="font-size: 14.350515463918pt;" aria-label="函数 (868个项目)">函数</a>
<a href="http://www.csdndoc.com/tag/gn" class="tag-cloud-link tag-link-48 tag-link-position-5" style="font-size: 9.0103092783505pt;" aria-label="功能 (373个项目)">功能</a>
<a href="http://www.csdndoc.com/tag/qk" class="tag-cloud-link tag-link-324 tag-link-position-6" style="font-size: 9.1546391752577pt;" aria-label="区块 (376个项目)">区块</a>
<a href="http://www.csdndoc.com/tag/cs" class="tag-cloud-link tag-link-25 tag-link-position-7" style="font-size: 9.1546391752577pt;" aria-label="参数 (377个项目)">参数</a>
<a href="http://www.csdndoc.com/tag/ml" class="tag-cloud-link tag-link-4 tag-link-position-8" style="font-size: 11.896907216495pt;" aria-label="命令 (590个项目)">命令</a>
<a href="http://www.csdndoc.com/tag/tx" class="tag-cloud-link tag-link-130 tag-link-position-9" style="font-size: 9.4432989690722pt;" aria-label="图像 (395个项目)">图像</a>
<a href="http://www.csdndoc.com/tag/zzl" class="tag-cloud-link tag-link-20 tag-link-position-10" style="font-size: 21.422680412371pt;" aria-label="在这里 (2,688个项目)">在这里</a>
<a href="http://www.csdndoc.com/tag/dz" class="tag-cloud-link tag-link-196 tag-link-position-11" style="font-size: 10.020618556701pt;" aria-label="地址 (432个项目)">地址</a>
<a href="http://www.csdndoc.com/tag/khd" class="tag-cloud-link tag-link-28 tag-link-position-12" style="font-size: 8.5773195876289pt;" aria-label="客户端 (344个项目)">客户端</a>
<a href="http://www.csdndoc.com/tag/rq" class="tag-cloud-link tag-link-215 tag-link-position-13" style="font-size: 11.030927835052pt;" aria-label="容器 (514个项目)">容器</a>
<a href="http://www.csdndoc.com/tag/dx" class="tag-cloud-link tag-link-34 tag-link-position-14" style="font-size: 9.1546391752577pt;" aria-label="对象 (379个项目)">对象</a>
<a href="http://www.csdndoc.com/tag/gj" class="tag-cloud-link tag-link-43 tag-link-position-15" style="font-size: 10.164948453608pt;" aria-label="工具 (441个项目)">工具</a>
<a href="http://www.csdndoc.com/tag/kfz" class="tag-cloud-link tag-link-294 tag-link-position-16" style="font-size: 11.175257731959pt;" aria-label="开发者 (529个项目)">开发者</a>
<a href="http://www.csdndoc.com/tag/js" class="tag-cloud-link tag-link-283 tag-link-position-17" style="font-size: 10.59793814433pt;" aria-label="技术 (475个项目)">技术</a>
<a href="http://www.csdndoc.com/tag/jk" class="tag-cloud-link tag-link-252 tag-link-position-18" style="font-size: 8.5773195876289pt;" aria-label="接口 (345个项目)">接口</a>
<a href="http://www.csdndoc.com/tag/cj" class="tag-cloud-link tag-link-68 tag-link-position-19" style="font-size: 8pt;" aria-label="插件 (316个项目)">插件</a>
<a href="http://www.csdndoc.com/tag/crtp" class="tag-cloud-link tag-link-42 tag-link-position-20" style="font-size: 16.80412371134pt;" aria-label="插入图片 (1,273个项目)">插入图片</a>
<a href="http://www.csdndoc.com/tag/cz-3" class="tag-cloud-link tag-link-513 tag-link-position-21" style="font-size: 8.8659793814433pt;" aria-label="操作 (363个项目)">操作</a>
<a href="http://www.csdndoc.com/tag/sj" class="tag-cloud-link tag-link-55 tag-link-position-22" style="font-size: 22pt;" aria-label="数据 (2,939个项目)">数据</a>
<a href="http://www.csdndoc.com/tag/sjk" class="tag-cloud-link tag-link-124 tag-link-position-23" style="font-size: 10.164948453608pt;" aria-label="数据库 (446个项目)">数据库</a>
<a href="http://www.csdndoc.com/tag/sz-3" class="tag-cloud-link tag-link-186 tag-link-position-24" style="font-size: 9.4432989690722pt;" aria-label="数组 (396个项目)">数组</a>
<a href="http://www.csdndoc.com/tag/wj" class="tag-cloud-link tag-link-81 tag-link-position-25" style="font-size: 18.247422680412pt;" aria-label="文件 (1,619个项目)">文件</a>
<a href="http://www.csdndoc.com/tag/ff" class="tag-cloud-link tag-link-18 tag-link-position-26" style="font-size: 11.175257731959pt;" aria-label="方法 (525个项目)">方法</a>
<a href="http://www.csdndoc.com/tag/fwq" class="tag-cloud-link tag-link-147 tag-link-position-27" style="font-size: 13.340206185567pt;" aria-label="服务器 (748个项目)">服务器</a>
<a href="http://www.csdndoc.com/tag/mx" class="tag-cloud-link tag-link-69 tag-link-position-28" style="font-size: 19.40206185567pt;" aria-label="模型 (1,962个项目)">模型</a>
<a href="http://www.csdndoc.com/tag/cs-2" class="tag-cloud-link tag-link-58 tag-link-position-29" style="font-size: 12.907216494845pt;" aria-label="测试 (684个项目)">测试</a>
<a href="http://www.csdndoc.com/tag/xx-2" class="tag-cloud-link tag-link-35 tag-link-position-30" style="font-size: 8.1443298969072pt;" aria-label="消息 (320个项目)">消息</a>
<a href="http://www.csdndoc.com/tag/bb" class="tag-cloud-link tag-link-6 tag-link-position-31" style="font-size: 13.340206185567pt;" aria-label="版本 (738个项目)">版本</a>
<a href="http://www.csdndoc.com/tag/zt" class="tag-cloud-link tag-link-79 tag-link-position-32" style="font-size: 8pt;" aria-label="状态 (313个项目)">状态</a>
<a href="http://www.csdndoc.com/tag/hj" class="tag-cloud-link tag-link-3 tag-link-position-33" style="font-size: 9.8762886597938pt;" aria-label="环境 (421个项目)">环境</a>
<a href="http://www.csdndoc.com/tag/yh" class="tag-cloud-link tag-link-44 tag-link-position-34" style="font-size: 14.20618556701pt;" aria-label="用户 (845个项目)">用户</a>
<a href="http://www.csdndoc.com/tag/sl" class="tag-cloud-link tag-link-17 tag-link-position-35" style="font-size: 10.164948453608pt;" aria-label="示例 (448个项目)">示例</a>
<a href="http://www.csdndoc.com/tag/cx" class="tag-cloud-link tag-link-31 tag-link-position-36" style="font-size: 9.7319587628866pt;" aria-label="程序 (414个项目)">程序</a>
<a href="http://www.csdndoc.com/tag/sf" class="tag-cloud-link tag-link-108 tag-link-position-37" style="font-size: 9.7319587628866pt;" aria-label="算法 (412个项目)">算法</a>
<a href="http://www.csdndoc.com/tag/xt" class="tag-cloud-link tag-link-96 tag-link-position-38" style="font-size: 13.484536082474pt;" aria-label="系统 (762个项目)">系统</a>
<a href="http://www.csdndoc.com/tag/xc" class="tag-cloud-link tag-link-19 tag-link-position-39" style="font-size: 8.7216494845361pt;" aria-label="线程 (350个项目)">线程</a>
<a href="http://www.csdndoc.com/tag/zj" class="tag-cloud-link tag-link-192 tag-link-position-40" style="font-size: 9.8762886597938pt;" aria-label="组件 (422个项目)">组件</a>
<a href="http://www.csdndoc.com/tag/jd" class="tag-cloud-link tag-link-12 tag-link-position-41" style="font-size: 14.061855670103pt;" aria-label="节点 (825个项目)">节点</a>
<a href="http://www.csdndoc.com/tag/sb" class="tag-cloud-link tag-link-160 tag-link-position-42" style="font-size: 9.7319587628866pt;" aria-label="设备 (413个项目)">设备</a>
<a href="http://www.csdndoc.com/tag/lj" class="tag-cloud-link tag-link-22 tag-link-position-43" style="font-size: 10.164948453608pt;" aria-label="路径 (445个项目)">路径</a>
<a href="http://www.csdndoc.com/tag/jx" class="tag-cloud-link tag-link-213 tag-link-position-44" style="font-size: 11.896907216495pt;" aria-label="镜像 (588个项目)">镜像</a>
<a href="http://www.csdndoc.com/tag/xm" class="tag-cloud-link tag-link-171 tag-link-position-45" style="font-size: 14.494845360825pt;" aria-label="项目 (891个项目)">项目</a></div>
</div>        <div class="widget box row">
            <div id="tab-title">
                <div class="tab">
                    <ul id="tabnav">
                        <li  class="selected">猜你想看的文章</li>
                    </ul>
                </div>
                <div class="clear"></div>
            </div>
            <div id="tab-content">
                <ul>
                                                <li><a href="http://www.pcgg.com.cn/gpqq/8534.html">登录和平精英为什么授权不成功</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/lol/23152.html">英雄联盟铂金段位怎么样</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/lol/16080.html">英雄联盟手游有没有人脸视频</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/gpqq/10784.html">和平精英苹果在电脑上登录怎么登</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/lol/24311.html">英雄联盟如何提示队友蓝量不足</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/gl/1285.html">原神无名的宝藏去哪里兑换</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/xyx/34324.html">4399游戏盒租号在哪</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/lol/32656.html">英雄联盟音乐在哪个文件夹</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/gl/563.html">原神女士周本怎么开</a></li>
                                                    <li><a href="http://www.pcgg.com.cn/gpqq/7029.html">和平精英安卓转移苹果需要多少钱</a></li>
                                        </ul>
            </div>
        </div>
        									</div>
	</div>
</div>
</div>
<div class="clear"></div>
<div id="footer">
<div class="container">
	<div class="twothird">
      </div>

</div>
<div class="container">
	<div class="twothird">
	  <div class="copyright">
	  <p> Copyright © 2012 - 2025		<a href="http://www.csdndoc.com/"><strong>程序员档案馆</strong></a> Powered by <a href="/lists">网站分类目录</a> | <a href="/top100.php" target="_blank">精选推荐文章</a> | <a href="/sitemap.xml" target="_blank">网站地图</a>  | <a href="/post/" target="_blank">疑难解答</a>

				<a href="https://beian.miit.gov.cn/" rel="external">京ICP备05034492号</a>
		 	  </p>
	  <p>声明:本站内容来自互联网,如信息有错误可发邮件到f_fb#foxmail.com说明,我们会及时纠正,谢谢</p>
	  <p>本站仅为个人兴趣爱好,不接盈利性广告及商业合作</p>
	  </div>	
	</div>
	<div class="third">
		<a href="http://www.xiaoboy.cn" target="_blank">小男孩</a>			
	</div>
</div>
</div>
<!--gototop-->
<div id="tbox">
    <a id="home" href="http://www.csdndoc.com" title="返回首页"><i class="fa fa-gohome"></i></a>
      <a id="pinglun" href="#comments_box" title="前往评论"><i class="fa fa-commenting"></i></a>
   
  <a id="gotop" href="javascript:void(0)" title="返回顶部"><i class="fa fa-chevron-up"></i></a>
</div>
<script src="//css.5d.ink/body5.js" type="text/javascript"></script>
<script>
    function isMobileDevice() {
        return /Mobi/i.test(navigator.userAgent) || /Android/i.test(navigator.userAgent) || /iPhone|iPad|iPod/i.test(navigator.userAgent) || /Windows Phone/i.test(navigator.userAgent);
    }
    // 加载对应的 JavaScript 文件
    if (isMobileDevice()) {
        var script = document.createElement('script');
        script.src = '//css.5d.ink/js/menu.js';
        script.type = 'text/javascript';
        document.getElementsByTagName('head')[0].appendChild(script);
    }
</script>
<script>
$(document).ready(function() { 
 $("#sidebar-follow").pin({
      containerSelector: ".main-container",
	  padding: {top:64},
	  minWidth: 768
	}); 
 $(".mainmenu").pin({
	 containerSelector: ".container",
	  padding: {top:0}
	});
 $(".swipebox").swipebox();	
});
</script>

 </body></html>