0
点赞
收藏
分享

微信扫一扫

ES入门

Java架构领域 2021-09-21 阅读 18
我的博客

ES适用于大数据搜索,近乎实时. 支持集群(主分片数在创建索引时确定,后面不可改),副本可以有多份,分担读压力.
ES基于Lucene实现,采用倒排索引,即对文档中的内容进行分词,每个词对应多个文档id,后面对词进行搜索可以找到关联的文档id.
ES首先需要创建索引,在版本6之后,一个索引里只允许有一个type即_doc, type中的field可以动态新增,但一般设成不允许动态新增.

创建索引

创建索引时可以指定主分片数和副本数,然后指定分词器和过滤器等.如下:

{
    "settings": {
        "index": {
            "max_ngram_diff": 7,
            "max_result_window": 1000000
        },
        "analysis": {
            "analyzer": {
                "ngram_analyzer": {
                    "tokenizer": "ngram_tokenizer",
                    "filter":["lowercase", "cjk_width"]
                }
            },
            "tokenizer": {
                "ngram_tokenizer": {
                    "type": "ngram",
                    "min_gram": 1,
                    "max_gram": 8
                }
            },
            "normalizer":{
                "lowercase":{
                    "type":"custom",
                    "filter":["lowercase", "cjk_width"]
                }
            }
        },
        "number_of_replicas": 1
    },
    "mappings": {
        "dynamic": "strict",
        "properties": {
            "clientClassify": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256,
                        "normalizer":"lowercase"
                    }
                },
                "analyzer": "ngram_analyzer"
            },
            "clientClassifyPinYin": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256,
                        "normalizer":"lowercase"
                    }
                },
                "analyzer": "ngram_analyzer"
            },
            "clientClassifyPY": {
                "type": "text",
                "fields": {
                    "keyword": {
                        "type": "keyword",
                        "ignore_above": 256,
                        "normalizer":"lowercase"
                    }
                },
                "analyzer": "ngram_analyzer"
            },
            "clientClassifyId": {
                "type": "keyword"
            }
     }
}

ES常用数据类型

  • text 字符串, 一般会分词
  • keyword 字符串,不分词,如java中的枚举
  • date 日期类型
  • double 浮点数
  • integer 整数
    等等

导入数据

ETL通过es rest api导入数据, 数据类型是json. 略.

ES搜索

通过_search rest方式搜索,一般使用DSL方式构建搜索内容, 如下:

{
    "from": 0, //分页
    "size": 1000,
    "timeout": "60s", //超时
    "query": {
        "bool": {
            "filter": [//对数据进行过滤,分数信息丢失. 其他bool搜索方式还有must, should, must_not
                {
                    "term": { // term搜索不会对搜索内容进行分词, match搜索的话会分词, 其他还有phase_match, prefix query, range query等 
                        "ownerId": {
                            "value": "8", //根据ownerId过滤
                            "boost": 1.0
                        }
                    }
                },
                {
                    "bool": {
                        "should": [
                            { // 再根据orderNumber和productName过滤.   和ownerId过滤是and关系
                                "term": {
                                    "orderNumber": {
                                        "value": "zengl",
                                        "boost": 1.0
                                    }
                                }
                            },
                            {
                                "term": {
                                    "productName": {
                                        "value": "袜子",
                                        "boost": 1.0
                                    }
                                }
                            }
                        ],
                        "adjust_pure_negative": true,
                        "minimum_should_match": "1",//orderNumber和productName只要满足一个条件即可
                        "boost": 1.0
                    }
                }
            ],
            "adjust_pure_negative": true,
            "boost": 1.0
        }
    },
    "_source": {
        "includes": [ //返回字段
            "*"
        ],
        "excludes": []
    },
    "sort": [ //排序
        {
            "orderDate": {
                "order": "desc"
            }
        },
        {
            "id": {
                "order": "desc"
            }
        }
    ],
    "track_scores": true, //统计分数
    "track_total_hits": 2147483647 //最大返回行数
}

返回json格式:

{
    "took": 89,//时间ms
    "timed_out": false,
    "_shards": {
        "total": 1,
        "successful": 1,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 65569,//满足条件行数
            "relation": "eq"
        },
        "max_score": 0.0,
        "hits": [//具体结果
                {
                "_index": "sales_order_prod_0007",
                "_type": "_doc",
                "_id": "QFqrG3UBTeK-WjKzYaLy",
                "_score": 0.0,
                "_source": {
                    "colorName": [
                        "增量颜色"
                    ],
               }
        ]
   }
}

ES聚合

目前没用到,略.

ES数据更新

通过_update_by_query可以对满足条件的数据进行更新,如下用到了painless脚本进行更新:

{
    "query": {
        "bool": {
            "filter": [
                {
                    "term": {
                        "ownerId": "2366"
                    }
                },
                {
                    "term":{
                        "productId":"82327"
                    }
                }
            ]
        }
    },
    "script": {
        "lang": "painless",
        // "inline": "Debug.explain(ctx._source.productId[0])",
        "inline": "for(int i=0;i<ctx._source.productId.size();i++){if(ctx._source.productId[i] == params.productId){ctx._source.productName[i] = params.productName; ctx._source.productNamePY[i] = params.productNamePY; ctx._source.productNamePinYin = params.productNamePinYin;}}",
        "params": {
            "ownerId": "2366",
            "productId": "82327",
            "productName": "围巾3",
            "productNamePY": "wj3",
            "productNamePinYin": "weijin3"
        }
    }
}

ES默认数据更新1S后才能被查询到,可以加上refrehsh=true强制刷盘,但是会降低性能.

ES删除数据

通过_delete_by_query, 删除满足条件的数据

ES更新冲突

可以根据更新返回的结果重试. 目前我们系统更新是串行的,没有这个问题.

举报

相关推荐

0 条评论