分布式搜索引擎-ElasticSearch（下集）

文档中心

个人简介

作者是一个来自河源的大三在校生，以下笔记都是作者自学之路的一些浅薄经验，如有错误请指正，将来会不断的完善笔记，帮助更多的Java爱好者入门。

文章目录

- 个人简介
- 分布式搜索引擎-ElasticSearch（下集）
- - 什么是ElasticSearch
  - - 分页
    - 字段高亮（highlight）
    - - 模仿百度搜索高亮
    - bool查询(用作于多条件查询)
    - 过滤器，区间条件（filter range）
    - 查看整个es的索引信息
  - elasticsearch的Java Api
  - - 准备阶段
    - 索引操作
    - - 创建索引
      - 删除索引
      - 检查索引是否存在
    - 文档操作
    - - 创建指定id的文档
      - 删除指定id的文档
      - 修改指定id的文档
      - 获取指定id的文档
      - 搜索(匹配全文match_all)
      - 搜索(模糊查询match)
      - 搜索(多字段搜索multi_match)
      - 搜索(筛选字段fetchSource)
      - 分页、排序、字段高亮
      - 布尔搜索(bool)
    - es实战(京东商品搜索)
    - - 从京东上爬取数据

分布式搜索引擎-ElasticSearch（下集）

注意：ElasticSearch版本为7.6.1

什么是ElasticSearch

ElasticSearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎，基于RESTful web接口。Elasticsearch是用Java开发的，并作为Apache许可条款下的开放源码发布，是当前流行的企业级搜索引擎。设计用于云计算中，能够达到实时搜索，稳定，可靠，快速，安装使用方便。

我们建立一个网站或应用程序，并要添加搜索功能，但是想要完成搜索工作的创建是非常困难的。我们希望搜索解决方案要运行速度快，我们希望能有一个零配置和一个完全免费的搜索模式，我们希望能够简单地使用JSON通过HTTP来索引数据，我们希望我们的搜索服务器始终可用，我们希望能够从一台开始并扩展到数百台，我们要实时搜索，我们要简单的多租户，我们希望建立一个云的解决方案。因此我们利用Elasticsearch来解决所有这些问题及可能出现的更多其它问题。摘选自《百度百科》

分页

GET goods/_search{   "query": {   "match_all": {} }   , "sort": [     {"od": {  "order": "desc"}     }   ] , "from" : 0   , "size": 2}

{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 4,      "relation" : "eq"    },    "max_score" : null,    "hits" : [      { "_index" : "goods", "_type" : "_doc", "_id" : "4", "_score" : null, "_source" : {   "title" : "IQOONEO5",   "content" : "IQOONEO5 高通骁龙870Soc ,",   "price" : "2499",   "od" : 4 }, "sort" : [   4 ]      },      { "_index" : "goods", "_type" : "_doc", "_id" : "3", "_score" : null, "_source" : {   "title" : "小米11",   "content" : "小米11 高通骁龙888Soc ,1亿像素",   "price" : "4500",   "od" : 3 }, "sort" : [   3 ]      }    ]  }}

字段高亮（highlight）

可以选择一个或者多个字段高亮，然后被选择的这些字段如果被条件匹配到则会默认加em标签

GET goods/_search{   "query": {   "match": {"title": "华为P40"     } },   "highlight": {   "fields": {"title": {}     } }   }

结果

{  "took" : 6,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 2,      "relation" : "eq"    },    "max_score" : 2.7309713,    "hits" : [      { "_index" : "goods", "_type" : "_doc", "_id" : "1", "_score" : 2.7309713, "_source" : {   "title" : "华为P40",   "content" : "华为P40 8+256G，麒麟990Soc，贼牛逼",   "price" : "4999",   "od" : 1 }, "highlight" : {   "title" : [     "华为P40"   ] }      },      { "_index" : "goods", "_type" : "_doc", "_id" : "2", "_score" : 1.5241971, "_source" : {   "title" : "华为Mate30",   "content" : "华为Mate30 8+128G，麒麟990Soc",   "price" : "3998",   "od" : 2 }, "highlight" : {   "title" : [     "华为Mate30"   ] }      }    ]  }}

默认是em标签，我们可以更改他的前缀和后缀，利用前端的知识

GET goods/_search{   "query": {   "match": {"title": "华为P40"     } },   "highlight": {     "pre_tags": "",     "post_tags": "" ,     "fields": {"title": {}     } }   }

{  "took" : 3,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 2,      "relation" : "eq"    },    "max_score" : 2.7309713,    "hits" : [      { "_index" : "goods", "_type" : "_doc", "_id" : "1", "_score" : 2.7309713, "_source" : {   "title" : "华为P40",   "content" : "华为P40 8+256G，麒麟990Soc，贼牛逼",   "price" : "4999",   "od" : 1 }, "highlight" : {   "title" : [     "华为P40"   ] }      },      { "_index" : "goods", "_type" : "_doc", "_id" : "2", "_score" : 1.5241971, "_source" : {   "title" : "华为Mate30",   "content" : "华为Mate30 8+128G，麒麟990Soc",   "price" : "3998",   "od" : 2 }, "highlight" : {   "title" : [     "华为Mate30"   ] }      }    ]  }}

模仿百度搜索高亮

分布式搜索引擎-ElasticSearch（下集）

例如百度搜索华为P40，不仅仅是title会高亮，content也会高亮，所以我们可以用multi_match+highlight实现

GET goods/_search{  "query": {      "multi_match": {     "query": "华为P40",     "fields": ["title","content"]   }  }    , "highlight": {    "pre_tags": "",    "post_tags": "",     "fields": {     "title": {},      "content": {}    }      }      }

{  "took" : 8,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 2,      "relation" : "eq"    },    "max_score" : 2.8157697,    "hits" : [      { "_index" : "goods", "_type" : "_doc", "_id" : "1", "_score" : 2.8157697, "_source" : {   "title" : "华为P40",   "content" : "华为P40 8+256G，麒麟990Soc，贼牛逼",   "price" : "4999",   "od" : 1 }, "highlight" : {   "title" : [     "华为P40"   ],   "content" : [     "华为P40 8+256G，麒麟990Soc，贼牛逼"   ] }      },      { "_index" : "goods", "_type" : "_doc", "_id" : "2", "_score" : 1.8023796, "_source" : {   "title" : "华为Mate30",   "content" : "华为Mate30 8+128G，麒麟990Soc",   "price" : "3998",   "od" : 2 }, "highlight" : {   "title" : [     "华为Mate30"   ],   "content" : [     "华为Mate30 8+128G，麒麟990Soc"   ] }      }    ]  }}

bool查询(用作于多条件查询)

类似于MYSQL的and or

重点：must 代表and ，should 代表 or

must（and）的使用：

下面我们在must里面给了两个条件，如果这里是must，那就必须两个条件都要满足

GET goods/_search{    "query": {    "bool": {  "must": [   {   "match": {     "title": "华为"   }   },   {     "match": {"content": "MATE30"     }   }   ] }  }}

结果：

{  "took" : 10,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 2.9512205,    "hits" : [      { "_index" : "goods", "_type" : "_doc", "_id" : "2", "_score" : 2.9512205, "_source" : {   "title" : "华为Mate30",   "content" : "华为Mate30 8+128G，麒麟990Soc",   "price" : "3998",   "od" : 2 }      }    ]  }}

should（or）的使用：

should里面同样有两个条件，但是只要满足一个就可以了

GET goods/_search{    "query": {    "bool": {  "should": [   {   "match": {     "title": "华为"   }   },   {     "match": {"content": "MATE30"     }   }] }  }}

结果：

{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 2,      "relation" : "eq"    },    "max_score" : 2.9512205,    "hits" : [      { "_index" : "goods", "_type" : "_doc", "_id" : "2", "_score" : 2.9512205, "_source" : {   "title" : "华为Mate30",   "content" : "华为Mate30 8+128G，麒麟990Soc",   "price" : "3998",   "od" : 2 }      },      { "_index" : "goods", "_type" : "_doc", "_id" : "1", "_score" : 1.5241971, "_source" : {   "title" : "华为P40",   "content" : "华为P40 8+256G，麒麟990Soc，贼牛逼",   "price" : "4999",   "od" : 1 }      }    ]  }}

过滤器，区间条件（filter range）

比如我们要实现，输入title=xx，我们如果想得到price>4000作为一个条件，可以用到这个。

GET goods/_search{    "query": {    "bool": {  "must": [   {   "match": {     "title": "小米"   }     } ],"filter": {   "range": {     "price": {"gt": 4000     }   } }      }  }}

{  "took" : 1,  "timed_out" : false,  "_shards" : {    "total" : 1,    "successful" : 1,    "skipped" : 0,    "failed" : 0  },  "hits" : {    "total" : {      "value" : 1,      "relation" : "eq"    },    "max_score" : 2.4135482,    "hits" : [      { "_index" : "goods", "_type" : "_doc", "_id" : "3", "_score" : 2.4135482, "_source" : {   "title" : "小米11",   "content" : "小米11 高通骁龙888Soc ,1亿像素",   "price" : "4500",   "od" : 3 }      }    ]  }}

查看整个es的索引信息

GET _cat/indices?v

elasticsearch的Java Api

准备阶段

1.导入elasticsearch高级客户端依赖和elasticsearch依赖（注意版本要和本机的es版本一致）,我们本机现在用的是7.6.1的es

 <dependency>     <groupId>org.elasticsearch.client</groupId>     <artifactId>elasticsearch-rest-high-level-client</artifactId>     <version>7.6.1</version> </dependency> <dependency>     <groupId>org.elasticsearch</groupId>     <artifactId>elasticsearch</artifactId>     <version>7.6.1</version> </dependency> <dependency>     <groupId>com.alibaba</groupId>     <artifactId>fastjson</artifactId>     <version>1.2.75</version> </dependency>

2.打开RestHighLevelClient的构造器：

public RestHighLevelClient(RestClientBuilder restClientBuilder) { this(restClientBuilder, Collections.emptyList());    }

我们发现需要传入一个RestClientBuilder，但是这个对象我们需要通过RestClient来得到，而不是RestClientBuilder

3.打开RestClient：

 public static RestClientBuilder builder(HttpHost... hosts) { if (hosts == null || hosts.length == 0) {     throw new IllegalArgumentException("hosts must not be null nor empty"); } List nodes = Arrays.stream(hosts).map(Node::new).collect(Collectors.toList()); return new RestClientBuilder(nodes);    }

我们发现RestClient的builder可以得到RestClientBuilder，然后我们点进去看HttpHost：

public HttpHost(String hostname, int port, String scheme) { //es所在主机名，es的端口号，协议（默认http） this.hostname = (String)Args.containsNoBlanks(hostname, "Host name"); this.lcHostname = hostname.toLowerCase(Locale.ROOT); if (scheme != null) {     this.schemeName = scheme.toLowerCase(Locale.ROOT); } else {     this.schemeName = "http"; } this.port = port; this.address = null;    }

4.然后我们就配置好了如下：

HttpHost httpHost = new HttpHost("localhost",9200,"http"); RestClientBuilder restClientBuilder = RestClient.builder(httpHost); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(restClientBuilder);

5.为了方便，我们可以把这个RestHighLevelClient交给SpringIOC容器管理，后面我们自动注入即可

@Configurationpublic class esConfig {    @Bean    public RestHighLevelClient restHighLevelClient(){ HttpHost httpHost = new HttpHost("localhost",9200,"http"); RestClientBuilder builder = RestClient.builder(httpHost); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); return restHighLevelClient;    } }

索引操作

java elasticsearch api操作索引都是用restHighLevelClient.indices().xxxxx()的格式

创建索引

//创建索引    @Test    public void createIndex() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); //new一个创建索引请求，并传入一个创建的索引名称 CreateIndexRequest createIndexRequest = new CreateIndexRequest("java01"); //向es发送创建索引请求。 CreateIndexResponse createIndexResponse = restHighLevelClient.indices().create(createIndexRequest, RequestOptions.DEFAULT); restHighLevelClient.close();    }

删除索引

//删除索引    @Test    public void deleteIndex() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); //new一个删除索引请求，并传入需要删除的索引名称 DeleteIndexRequest deleteIndexRequest = new DeleteIndexRequest("java01"); //resthighLevelClient发送删除索引请求 restHighLevelClient.indices().delete(deleteIndexRequest,RequestOptions.DEFAULT); restHighLevelClient.close();    }

检查索引是否存在

//检查索引是否存在    @Test    public void indexExsit() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); GetIndexRequest getIndexRequest = new GetIndexRequest("goods"); boolean exists = restHighLevelClient.indices().exists(getIndexRequest, RequestOptions.DEFAULT); System.out.println(exists);    }

文档操作

创建指定id的文档

//创建文档    @Test    public void createIndexDoc() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); IndexRequest indexRequest = new IndexRequest("hello"); //指定文档id indexRequest.id("1"); /  *  public IndexRequest source(Map source, XContentType contentType) throws ElasticsearchGenerationException {  *  try {  *      XContentBuilder builder = XContentFactory.contentBuilder(contentType);  *      builder.map(source);  *      return this.source(builder);  *  } catch (IOException var4) {  *      throw new ElasticsearchGenerationException("Failed to generate [" + source + "]", var4);  *  }  *     }  *     source有很多种方法，哪种都可以，我现在选的是Map的方法添加key:value  */ Map<String,Object> source=new HashMap<>(); source.put("a_age","50"); source.put("a_address","广州"); //在es里面，一切皆为JSON，我们要把Map用fastjson转换成JSON字符串，XContentType指定为JSON类型 indexRequest.source(JSON.toJSONString(source), XContentType.JSON); IndexResponse response = restHighLevelClient.index(indexRequest, RequestOptions.DEFAULT); System.out.println("response:"+response); System.out.println("status:"+response.status());    }

删除指定id的文档

  //删除文档    @Test    public void deleteDoc() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); DeleteRequest deleteRequest = new DeleteRequest("hello"); deleteRequest.id("1"); DeleteResponse delete = restHighLevelClient.delete(deleteRequest, RequestOptions.DEFAULT); System.out.println(delete.status());    }

修改指定id的文档

//修改文档    @Test    public void updateDoc() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); /  * 通过下面的方法去调用  *     public UpdateRequest(String index, String id) {  *  super(index);  *  this.refreshPolicy = RefreshPolicy.NONE;  *  this.waitForActiveShards = ActiveShardCount.DEFAULT;  *  this.scriptedUpsert = false;  *  this.docAsUpsert = false;  *  this.detectNoop = true;  *  this.id = id;  *     }  */ UpdateRequest updateRequest = new UpdateRequest("hello","1"); Map<String,Object> source=new HashMap<>(); source.put("a_address","河源"); updateRequest.doc(JSON.toJSONString(source),XContentType.JSON); UpdateResponse response = restHighLevelClient.update(updateRequest, RequestOptions.DEFAULT); System.out.println(response.status());    }

获取指定id的文档

 //获取文档    @Test    public void getDoc() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); GetRequest getRequest = new GetRequest("hello"); getRequest.id("1"); GetResponse response = restHighLevelClient.get(getRequest, RequestOptions.DEFAULT); String sourceAsString = response.getSourceAsString(); System.out.println(sourceAsString);    }

搜索(匹配全文match_all)

//搜索(匹配全文match_all)    @Test    public void search_matchAll() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); /  *  public SearchRequest(String... indices) {  *  this(indices, new SearchSourceBuilder());  *     }  */ SearchRequest searchRequest = new SearchRequest("hello"); //相当于文本 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); MatchAllQueryBuilder matchAllQueryBuilder = QueryBuilders.matchAllQuery(); searchSourceBuilder.query(matchAllQueryBuilder); //相当于search的query searchRequest.source(searchSourceBuilder); SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = search.getHits().getHits();  for (SearchHit hit : hits) {     System.out.println(hit.getSourceAsString()); }    }

搜索(模糊查询match)

//模糊搜索match    @Test    public void search_match() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); SearchRequest searchRequest = new SearchRequest(); //查询文本 SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("a_address", "广州"); searchSourceBuilder.query(matchQueryBuilder); searchRequest.source(searchSourceBuilder); SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = search.getHits().getHits();  for (SearchHit hit : hits) {     System.out.println(hit.getSourceAsString()); }    }

搜索(多字段搜索multi_match)

 //搜索(多字段搜索multi_match)    @Test    public void  search_term() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); SearchRequest searchRequest = new SearchRequest("goods"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.multiMatchQuery("华为","title","content")); searchRequest.source(searchSourceBuilder); SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = search.getHits().getHits(); for (SearchHit hit : hits) {     System.out.println(hit.getSourceAsString()); }    }

搜索(筛选字段fetchSource)

fetchsource方法相当于_source

//fetchsource实现筛选字段(_source)    @Test    public void search_source() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); SearchRequest searchRequest = new SearchRequest("goods"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); searchSourceBuilder.query(QueryBuilders.matchAllQuery()); /  * public SearchSourceBuilder fetchSource(@Nullable String[] includes, @Nullable String[] excludes) {  *  FetchSourceContext fetchSourceContext = this.fetchSourceContext != null ? this.fetchSourceContext : FetchSourceContext.FETCH_SOURCE;  *  this.fetchSourceContext = new FetchSourceContext(fetchSourceContext.fetchSource(), includes, excludes);  *  return this;  *     }  *  */ String[] includes={"title"}; //包含 String[] excludes={}; //排除 searchSourceBuilder.fetchSource(includes,excludes); searchRequest.source(searchSourceBuilder); SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = search.getHits().getHits(); for (SearchHit hit : hits) {     System.out.println(hit.getSourceAsString()); }    }

分页、排序、字段高亮

我们要把下面的es命令行代码转换成Java代码

GET goods/_search{    "query": {   "match": { "title": "华为"      }     },"sort": [    {      "od": { "order": "desc"      }    }  ]    ,"from": 0,  "size": 1,  "highlight": {    "pre_tags": "",    "post_tags": "",     "fields": {     "title": {}    }      } }

Java 实现

//分页，排序，字段高亮    @Test    public void page_sort_HighLight() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); SearchRequest searchRequest = new SearchRequest("goods"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); MatchQueryBuilder matchQueryBuilder = QueryBuilders.matchQuery("title", "华为"); searchSourceBuilder.query(matchQueryBuilder); //分页==== searchSourceBuilder.from(0); searchSourceBuilder.size(1); //======= //排序 searchSourceBuilder.sort("od", SortOrder.DESC); //字段高亮 //=========高亮开始== HighlightBuilder highlightBuilder = new HighlightBuilder(); //构建高亮的前缀后缀标签pre_tag和post_tag highlightBuilder.preTags(""); highlightBuilder.postTags(""); //highlightBuilder.field()方法我们用一个String类型的 /  * public HighlightBuilder field(String name) {  *  return this.field(new HighlightBuilder.Field(name));  *     }  */ highlightBuilder.field("title"); //如果还需要更多字段高亮，则多写一遍field方法// highlightBuilder.field(); //第二个字段高亮// highlightBuilder.field(); //第三个字段高亮 。。。。。以此类推 searchSourceBuilder.highlighter(highlightBuilder); //====================高亮结束 searchRequest.source(searchSourceBuilder); SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = search.getHits().getHits(); //hits里面封装了命中的所有数据 for (SearchHit hit : hits) {     Map<String, HighlightField> highlightFields = hit.getHighlightFields();     System.out.println("highlightMap:"+highlightFields);     //通过title这个key去获取fragments     //fragment里面是高亮之后的字段内容（很重要，可以用来覆盖原来没高亮的字段内容） 华为Mate30     System.out.println("fragments:"+Arrays.toString(highlightFields.get("title").getFragments())); } restHighLevelClient.close();    }

布尔搜索(bool)

实现类似如下es代码：

GET goods/_search{  "query": { "bool": {     "should": [ {    "term": {    "title": {      "value": "华"    }  }    }, {      "term": {     "title": {"value": "米"     }   }    }      ]   }      }}

Java实现：

 //布尔搜索(bool)    @Test    public void search_bool() throws IOException { RestClientBuilder builder = RestClient.builder(new HttpHost("localhost", 9200, "http")); RestHighLevelClient restHighLevelClient = new RestHighLevelClient(builder); SearchRequest searchRequest = new SearchRequest("goods"); SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder(); //通过searchSourceBuilder对象构建bool查询对象 BoolQueryBuilder boolQueryBuilder = QueryBuilders.boolQuery(); //这里should只能写一个，如should里面有多个条件，可以写多个should /  *  *  "should": [  *  {  *  *   "term": {  *     "title": {  *"value": "华"  *     }  *   }  *  *  },  *  {  *  *    "term": {  *      "title": {  * "value": "米"  *      }  *    }  */ //例如上面should有两个条件，我们就要写两个should boolQueryBuilder.should(QueryBuilders.termQuery("title","华")); boolQueryBuilder.should(QueryBuilders.termQuery("title","米")); searchSourceBuilder.query(boolQueryBuilder);  searchRequest.source(searchSourceBuilder); SearchResponse search = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT); SearchHit[] hits = search.getHits().getHits(); for (SearchHit hit : hits) {     System.out.println(hit.getSourceAsString()); } restHighLevelClient.close();    }

es实战(京东商品搜索)

从京东上爬取数据

1:导入依赖：

  <dependency>     <groupId>org.jsoup</groupId>     <artifactId>jsoup</artifactId>     <version>1.12.1</version> </dependency>

2.创建实体类：

public class goods{    private String img; //商品图片    private String price; //商品价格    private String title; //商品标题    public goods() {    }    public goods(String img, String price, String title) { this.img = img; this.price = price; this.title = title;    }    public String getImg() { return img;    }    public void setImg(String img) { this.img = img;    }    public String getPrice() { return price;    }    public void setPrice(String price) { this.price = price;    }    public String getTitle() { return title;    }    public void setTitle(String title) { this.title = title;    }    @Override    public String toString() { return "goods{" +  "img='" + img + '\'' +  ", price='" + price + '\'' +  ", title='" + title + '\'' +  '}';    }}

3.利用jsoup解析爬取京东商城搜索(核心)，编写工具类：

@Componentpublic class jsoupUtils {    private static RestHighLevelClient restHighLevelClient;    @Autowired    public  void setRestHighLevelClient(RestHighLevelClient restHighLevelClient) { jsoupUtils.restHighLevelClient = restHighLevelClient;    }    /     *封装了京东搜索功能，把搜索的数据添加进es中     */    public static void searchData_JD(String keyword) { BulkRequest bulkRequest = new BulkRequest(); try {     URL url = null;     try {  url = new URL("https://search.jd.com/Search?keyword=" + keyword);     } catch (MalformedURLException e) {  e.printStackTrace();     }     Document document = null;//jsoup解析URL     try {  document = Jsoup.parse(url, 30000);     } catch (IOException e) {  e.printStackTrace();     }     Element e1 = document.getElementById("J_goodsList");     Elements e_lis = e1.getElementsByTag("li");     for (Element e_li : e_lis) {  //这边可能获取到多个价格，因为有些有套餐价格，我们可以获取第一个价格  Elements e_price = e_li.getElementsByClass("p-price");  String text = e_price.get(0).text();  //这里获取的价格可能有多个，正常价和京东PLUS会员专享价，所以我们要进行切分  String realPirce = "￥";  int x = 1; //默认第一个就是￥的符号，也从1开始遍历，如果还有￥符号就break即可  for (int i = 1; i < text.length(); i++) {      if (text.charAt(i) == '￥') {   break;      } else {   realPirce += text.charAt(i);      }  }  //商品图片  Elements e_img = e_li.getElementsByClass("p-img");  Elements img = e_img.get(0).getElementsByTag("img");  //因为京东的商品图片不是封装到src里面的，而是封装到懒加载属性==data-lazy-img  String src = img.get(0).attr("data-lazy-img");  System.out.println("http:" + src);  //价格  System.out.println(realPirce);  //商品标题  Elements e_title = e_li.getElementsByClass("p-name");  String title = e_title.get(0).getElementsByTag("em").text();  System.out.println(title);  IndexRequest indexRequest = new IndexRequest("jd_goods");  //添加信息  Map<String,Object> good=new HashMap<>();  good.put("img","http:" + src);  good.put("price",realPirce);  good.put("title",title);  IndexRequest source = indexRequest.source(JSON.toJSONString(good), XContentType.JSON);  bulkRequest.add(source);     }     //批量操作，减少访问es服务器的次数restHighLevelClient.bulk(bulkRequest, RequestOptions.DEFAULT); }catch (Exception e){     System.out.println(e.getMessage()); }    }}

4.使用工具类：

public static void main(String[] args) { SpringApplication.run(DemoApplication.class, args); jsoupUtils.searchData_JD("vivo");     }

有了数据我们就可以用来展示到页面上了。。。。。

分布式搜索引擎-ElasticSearch（下集）

个人简介

文章目录

分布式搜索引擎-ElasticSearch（下集）

什么是ElasticSearch

分页

字段高亮（highlight）

模仿百度搜索高亮

bool查询(用作于多条件查询)

过滤器，区间条件（filter range）

查看整个es的索引信息

elasticsearch的Java Api

准备阶段

索引操作

创建索引

删除索引

检查索引是否存在

文档操作

创建指定id的文档

删除指定id的文档

修改指定id的文档

获取指定id的文档

搜索(匹配全文match_all)

搜索(模糊查询match)

搜索(多字段搜索multi_match)

搜索(筛选字段fetchSource)

分页、排序、字段高亮

布尔搜索(bool)

es实战(京东商品搜索)

从京东上爬取数据

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签

分布式搜索引擎-ElasticSearch（下集）

个人简介

文章目录

分布式搜索引擎-ElasticSearch（下集）

什么是ElasticSearch

分页

字段高亮（highlight）

模仿百度搜索高亮

bool查询(用作于多条件查询)

过滤器，区间条件（filter range）

查看整个es的索引信息

elasticsearch的Java Api

准备阶段

索引操作

创建索引

删除索引

检查索引是否存在

文档操作

创建指定id的文档

删除指定id的文档

修改指定id的文档

获取指定id的文档

搜索(匹配全文match_all)

搜索(模糊查询match)

搜索(多字段搜索multi_match)

搜索(筛选字段fetchSource)

分页、排序、字段高亮

布尔搜索(bool)

es实战(京东商品搜索)

从京东上爬取数据

相关问题

公告

DeepSeek全套部署资料免费下载

免费可商用字体批量下载

标签