0
点赞
收藏
分享

微信扫一扫

Elasticsearch 分页与遍历 From, Size,Search After & Scorll API

yeamy 2022-05-31 阅读 88


文章目录

  • ​​1. From / Size​​
  • ​​2. 分布式系统中深度分页的问题​​
  • ​​3. Search After 避免深度分页的问题​​
  • ​​4. Scoll API​​
  • ​​5. 不同的搜索类型和使用场景​​

1. From / Size

默认情况下,查询按照相关度算分排序,返回前 10 条记录
容易理解的分页方案

  • From : 开始位置
  • Size:期望获取文档的总数
    Elasticsearch 分页与遍历 From, Size,Search After & Scorll API_数据
    from与size设置超过10000会报错

2. 分布式系统中深度分页的问题

ES 天生就是分布式,查询信息,但是数据分别保存在多个分片,多台机器,ES 天生就需要满足排序的需要(按照相关性算分)
当一个查询:From = 990 ,Size =10

  • 会在每个分片上先获取 1000 个文档。然后,通过​​Coordinating Node​​ 聚合所有结果。最后在通过排序选取前 1000个文档
  • 页数越深,占用内容越多。为了避免深度分页带来的内存开销。ES 有个设定,默认限定到 10000 个文档
    Elasticsearch 分页与遍历 From, Size,Search After & Scorll API_elasticsearch_02

POST tmdb/_search
{
"from": 10000,
"size": 1,
"query": {
"match_all": {}
}
}
//

3. Search After 避免深度分页的问题

避免深度分页的性能问题,可以实时获取下一页文档信息

  • 不支持指定页数(From)
  • 不能往下翻

第一步搜索需要指定 ​​sort​​​,并且保证值是唯一的(可以通过加入_id 保证唯一性)
然后使用上一次,最后一个文档的 sort 值进行查询

POST users/_doc
{"name":"user1","age":10}
POST users/_doc
{"name":"user2","age":11}
POST users/_doc
{"name":"user2","age":12}
POST users/_doc
{"name":"user2","age":13}
POST users/_count
POST users/_search
{
"size": 1,
"query": {
"match_all": {}
},
"sort": [
{"age": "desc"} ,
{"_id": "asc"}
]
}
//返回
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "QlE9AHgBkFqZmZhjeUx_",
"_score" : null,
"_source" : {
"name" : "user2",
"age" : 13
},
"sort" : [
13,
"QlE9AHgBkFqZmZhjeUx_"
]
}
]
}
}
POST users/_search
{
"size": 1,
"query": {
"match_all": {}
},
"search_after":
[
13,
"QlE9AHgBkFqZmZhjeUx_"
],
"sort": [
{"age": "desc"} ,
{"_id": "asc"}
]
}

返回输出
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 6,
"relation" : "eq"
},
"max_score" : null,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "RFE_AHgBkFqZmZhj30zN",
"_score" : null,
"_source" : {
"name" : "user2",
"age" : 13
},
"sort" : [
13,
"RFE_AHgBkFqZmZhj30zN"
]
}
]
}
}

4. Scoll API

创建一个快照,有新的数据写入以后,无法被查找
每次查询后,输入上一次的 ​​​Sroll Id​

DELETE users
POST users/_doc
{"name":"user1","age":10}
POST users/_doc
{"name":"user2","age":20}
POST users/_doc
{"name":"user3","age":30}
POST users/_doc
{"name":"user4","age":40}

#查询文档个数
POST users/_count
{
"count" : 4,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
}
}



POST /users/_search?scroll=5m
{
"size": 1,
"query": {
"match_all" : {
}
}
}
返回输出:保存_scroll_id
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAXIWdFBMMS1DMklUNi1lVmJ2NUZmRVdsZw==",
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "RlFDAHgBkFqZmZhj3UwF",
"_score" : 1.0,
"_source" : {
"name" : "user1",
"age" : 10
}
}
]
}
}


POST /_search/scroll
{
"scroll" : "1m",
"scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAXIWdFBMMS1DMklUNi1lVmJ2NUZmRVdsZw=="
}

返回输出:
{
"_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAAXIWdFBMMS1DMklUNi1lVmJ2NUZmRVdsZw==",
"took" : 12,
"timed_out" : false,
"terminated_early" : true,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 4,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "users",
"_type" : "_doc",
"_id" : "R1FDAHgBkFqZmZhj40zs",
"_score" : 1.0,
"_source" : {
"name" : "user2",
"age" : 20
}
}
]
}
}

5. 不同的搜索类型和使用场景

  • Regular :需要实时获取顶部的部分文档。例如查询最新的订单
  • Scorll :需要全部文档,例如导出全部数据
  • Pagination :From 和 Size 如何需要深度分页,则选用 Search After

举报

相关推荐

0 条评论