分布式搜索--elasticsearch-CFANZ编程社区

一、初识 elasticsearch

1. 了解 ES

(2) Lucene 与 elasticsearch 的区别

Lucene 是一个Java语言的搜索引擎类库

Lucene的优势：

Lucene的缺点：

相比于 lucene，elasticsearch 具备下列

优势：

2. 倒排索引

elasticsearch 采用倒排索引：

倒排索引中包含两部分内容：

3. es 的一些概念

(1) es 与 mysql 对比

(2) 架构

4. 安装 es、kibana

(1) 部署单点 es

(2) 部署 kibana

(3) 安装 IK 分词器

1) 分词器的作用

2) 默认的分词语法说明：

在 kibana 的 DevTools 中测试：

POST /_analyze
{
  "analyzer": "standard",
  "text": "床前明月光，疑是地上霜！"
}

3) ik 分词器包含两种模式：

4) ik 分词器扩展词条

要拓展ik分词器的词库，只需要修改一

个 ik 分词器目录中的 config 目录中的

IkAnalyzer.cfg.xml 文件：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典 *** 添加扩展词典-->
        <entry key="ext_dict">ext.dic</entry>
</properties>

然后在名为 ext.dic 的文件中，添加想要

拓展的词语即可

5) 停用词条

在 stopword.dic 文件中，添加想要拓展的

词语即可：

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE properties SYSTEM "http://java.sun.com/dtd/properties.dtd">
<properties>
        <comment>IK Analyzer 扩展配置</comment>
        <!--用户可以在这里配置自己的扩展字典-->
        <entry key="ext_dict">ext.dic</entry>
         <!--用户可以在这里配置自己的扩展停止词字典  *** 添加停用词词典-->
        <entry key="ext_stopwords">stopword.dic</entry>
</properties>

(4) 部署 es 集群

直接使用 docker-compose 来完成

二、索引库操作

1. mapping 映射属性

(1) mapping 是对索引库中文档的约束，常

见的 mapping 属性包括：

① type：字段数据类型，常见的简单类型有：

② index：是否创建索引，默认为 true

③ analyzer：使用哪种分词器

④ properties：该字段的子字段

2. 索引库的 CRUD

(1) 创建索引库

创建索引库和 mapping 的 DSL 语法如下：

PUT /索引库名称

PUT /索引库名称
{
  "mappings": {
    "properties": {
      "字段名":{
        "type": "text",
        "analyzer": "ik_smart"
      },
      "字段名2":{
        "type": "keyword",
        "index": "false"
      },
      "字段名3":{
        "properties": {
          "子字段": {
            "type": "keyword"
          }
        }
      },
      // ...略
    }
  }
}

(2) 查看索引库

GET /索引库名

(3) 修改索引库

索引库和 mapping 一旦创建无法修改，

但是可以添加新的字段，语法如下：

PUT /索引库名/_mapping

PUT /索引库名/_mapping
{
  "properties": {
    "新字段名":{
      "type": "integer"
    }
  }
}

(4) 删除索引库

DELETE /索引库名

三、文档操作

1. 新增文档

POST /索引库名/_doc/文档id

POST /索引库名/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    "字段3": {
        "子属性1": "值3",
        "子属性2": "值4"
    },
    // ...
}

2. 查询文档

GET /索引库名/_doc/文档id

3. 删除文档

DELETE /索引库名/_doc/文档id

4. 修改文档

(1) 全量修改

删除旧文档，添加新文档

本质是：根据指定的 id 删除文档，新增

一个相同 id 的文档

PUT /{索引库名}/_doc/文档id
{
    "字段1": "值1",
    "字段2": "值2",
    // ... 略
}

(2) 增量修改

修改指定字段值

POST /{索引库名}/_update/文档id
{
    "doc": {
         "字段名": "新的值",
    }
}

5. Dynamic Mapping

JSON类型	Elasticsearch类型
字符串	① 日期格式字符串：mapping 为 date 类型 ② 普通字符串：mapping 为 text 类型，并添加 keyword 类型子字段
布尔值	boolean
浮点数	float
整数	long
对象嵌套	object，并添加 properties
数组	由数组中的第一个非空类型决定
空值	忽略

四、RestClient 操作索引库

1. 创建索引库

(1) 导入数据库

(2) 分析数据结构

mapping 要考虑的问题：

(3) 初始化 JavaRestClient

① 引入依赖

<dependency>
    <groupId>org.elasticsearch.client</groupId>
    <artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>

<properties>
    <java.version>1.8</java.version>
    <elasticsearch.version>7.12.1</elasticsearch.version>
</properties>

② 初始化

RestHighLevelClient client = new RestHighLevelClient(RestClient.builder(
       HttpHost.create("http://192.168.150.101:9200")
));

(4) 创建索引库代码

@Testvoid testCreateHotelIndex() throws IOException {
    // 1.创建Request对象
    CreateIndexRequest request = new CreateIndexRequest("hotel");
    // 2.请求参数，MAPPING_TEMPLATE是静态常量字符串，内容是创建索引库的DSL语句      
    request.source(MAPPING_TEMPLATE, XContentType.JSON);
    // 3.发起请求, indices 返回的对象中包含索引库操作的所有方法
    client.indices().create(request, RequestOptions.DEFAULT);
}

2. 删除索引库代码

@Test
void testDeleteHotelIndex() throws IOException {
    // 1.创建Request对象
    DeleteIndexRequest request = new DeleteIndexRequest("hotel");
    // 2.发起请求
    client.indices().delete(request, RequestOptions.DEFAULT);
}

3. 判断索引库是否存在

@Test
void testExistsHotelIndex() throws IOException {
    // 1.创建Request对象
    GetIndexRequest request = new GetIndexRequest("hotel");
    // 2.发起请求 
    boolean exists = client.indices().exists(request, RequestOptions.DEFAULT);
    // 3.输出
    System.out.println(exists);
}

五、RestClient 操作文档

1. 初始化

public class ElasticsearchDocumentTest {   
    // 客户端
    private RestHighLevelClient client;

    @BeforeEach
    void setUp() {
        client = new RestHighLevelClient(RestClient.builder(                       
            HttpHost.create("http://192.168.150.101:9200")
        ));
    }

    @AfterEach
    void tearDown() throws IOException {
        client.close();
    }
}

2. 新增文档

@Test
void testIndexDocument() throws IOException {
    // 1.创建request对象
    IndexRequest request = new IndexRequest("indexName").id("1");
    // 2.准备JSON文档
    request.source("{\"name\": \"Jack\", \"age\": 21}", XContentType.JSON);
    // 3.发送请求
    client.index(request, RequestOptions.DEFAULT);
}

3. 查询文档

@Test
void testGetDocumentById() throws IOException {
    // 1.创建request对象
    GetRequest request = new GetRequest("indexName", "1");
    // 2.发送请求，得到结果
    GetResponse response = client.get(request, RequestOptions.DEFAULT);
    // 3.解析结果
    String json = response.getSourceAsString();
    
    System.out.println(json);
}

4. 修改文档

@Test
void testUpdateDocumentById() throws IOException {
    // 1.创建request对象
    UpdateRequest request = new UpdateRequest("indexName", "1");
    // 2.准备参数，每2个参数为一对 key value
    request.doc(
        "age", 18,
        "name", "Rose"
    );
    // 3.更新文档
    client.update(request, RequestOptions.DEFAULT);
}

5. 删除文档

@Test
void testDeleteDocumentById() throws IOException {
    // 1.创建request对象
    DeleteRequest request = new DeleteRequest("indexName", "1");
    // 2.删除文档 
    client.delete(request, RequestOptions.DEFAULT);
}

6. 批量导入文档

@Test
void testBulk() throws IOException {
    // 1.创建Bulk请求
    BulkRequest request = new BulkRequest();
    // 2.添加要批量提交的请求：这里添加了两个新增文档的请求
    request.add(new IndexRequest("hotel")
        .id("101").source("json source", XContentType.JSON));
    request.add(new IndexRequest("hotel")
        .id("102").source("json source2", XContentType.JSON));
    // 3.发起bulk请求
    client.bulk(request, RequestOptions.DEFAULT);
}