elasticsearch中的keyword设置ignore

概念

我们在ElasticSearch的官方文档上可以看到这样关于mapping中ignore_above的解释：

Strings longer than the ignore_above setting will not be indexed or stored. For arrays of strings, ignore_above will be applied for each array element separately and string elements longer than ignore_above will not be indexed or stored.

创建 mapping 时，可以为字符串（专指 keyword）指定 ignore_above ，用来限定字符长度。超过 ignore_above 的字符会被存储，但不会被索引。

注意，是字符长度，一个英文字母是一个字符，一个汉字也是一个字符。

在动态生成的 mapping 中，keyword类型会被设置ignore_above: 256。

示例

这里，在特此说明一下，如果keyword字段的ignore_above设置上限超过给定的值，比如是20，那么当存储一个字段超过20个字符时，会怎么样呢。

PUT my_index
{
"mappings": {
"properties": {
"message": {
"type": "keyword",
"store": true,
"ignore_above": 20
}
}
}
}

添加测试数据

PUT _bulk
{"index":{"_index":"my_index","_id":"1"}}
{"message":"123456789"}
{"index":{"_index":"my_index","_id":"2"}}
{"message":"12345678901234567890"}
{"index":{"_index":"my_index","_id":"3"}}
{"message":"12345678901234567890123"}

验证

GET my_index/_search

结果

{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"message" : "123456789"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"message" : "12345678901234567890"
}
},
{
"_index" : "my_index",
"_type" : "_doc",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"message" : "12345678901234567890123"
}
}
]
}
}

验证我们可以发现，超过20个字符的数据是可以被存储的。

再来验证搜索是否可以被搜索到

GET my_index/_search
{
"query":{
"match":{
"message":"123456789"
}
}
}

GET my_index/_search
{
"query":{
"match":{
"message":"12345678901234567890123"
}
}
}

验证发现，20个以内的数据在_source中存在，20个以上的字符时不会被检索到的。

可以通过下面的方式修改ignore_above

PUT my_index
{
"mappings": {
"properties": {
"message": {
"type": "keyword",
"store": true,
"ignore_above": 10
}
}
}
}

改大改小都行，但只对新数据有效。

注意：text 类型不支持 ignore_above