昨日回顾

#1  装了es，jdk环境
#2  装了kibana(官方提供，配置连接哪个es)，postman也可以
#3  装了elsaticsearch-head（第三方，存在跨域，修改es配置） npm install   npm run start

#4 索引的增删查改：分片数量，备份数量（改备份数量）
#5 映射管理：表的创建，字段和字段属性（string:keyword,text）,keyword不会分词直接建索引，text会分词再建索引

#6 倒排索引：对一篇文章先进行分词，然后对每个词建立索引，正向索引是根据文章标题建立索引

#7 文档的增删查改（改有两种情况，覆盖，更新）

#8 查询：结构化查询
	-match
    -match_all
    -match_phrase
    -match_phrase---slop 隔多少

python中的GIL锁

GIL:全局解释器锁，cpython解释器存在的，其他解释器jpython，pypy：gil锁
为什么cpython存在这个问题，我们大量的使用？
	-大量的第三方模块，内置模块都是基于cpython写起来的
    
cpython中多线程的运行，必须抢到gil锁，才能运行（GIL其实就是个大的互斥锁，把原来本应该并行的，变成串行）
线程是cpu调度的最小单位，一个进程下起了3个线程，
在同一进程下，同一时刻，只有一条线程在执行，所以不能利用多核优势
开跟cpu核数相同的线程：由于有gil锁，其实同一时刻只有一条线程在执行，所以cpu肯定不会百分百
开跟cpu核数相同的进程：gil只能锁住当前python解释器的进程内的线程，多个进程内的线程会被多个cpu调度执行，所以cpu会百分百占满

只存在于cpython解释器
计算密集型（用cpu），开多进程
io密集型(不太用cpu),开多线程

python2中，遇到io或者代码执行一定的行数，会释放gil
python3中，遇到io或者时间到了，会释放gil锁

一文档查询操作

1 match和term查询

# 并且和或者的条件
#并且
GET t3/doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "title": "beautiful"
          }
        },
        {
          "match": {
            "desc": "beautiful"
          }
        }
      ]
    }
  }
}

#或者
GET t3/doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "title": "beautiful"
          }
        },
        {
          "match": {
            "desc": "beautiful"
          }
        }
      ]
    }
  }
}



# match，term和terms的区别
	-match查的短语会分词
    GET w10/_doc/_search
        {
          "query": {
            "match": {
              "t1": "Beautiful girl!"
            }
          }
        }
    -term查的不会分词
    GET w10/_doc/_search
            {
          "query": {
            "term": {
              "t1": "girl"
            }
          }
        }
    -terms由于部分词，想查多个，terms
        GET w10/_doc/_search
        {
          "query": {
            "terms": {
              "t1": ["beautiful", "sexy"]
            }
          }
        }
        
        
        
# pymysql   原生操作，查出字典
# orm       orm直接转成对象

2 排序查询

##### 不是所有字段都支持排序，只有数字类型，字符串不支持

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  }
}

#降序
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

## 升序
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}


GET lqz/_doc/_search
{
  "query": {
    "match_all": {
    }
  },
  "sort": [
    {
      "age": {
        "order": "asc"
      }
    }
  ]
}

3 分页查询


#从第二条开始，取一条
GET lqz/_doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ]
}

GET lqz/_doc/_search
{
  "query": {
    "match_all": {}
  },
  "sort": [
    {
      "age": {
        "order": "desc"
      }
    }
  ], 
  "from": 2,
  "size": 2
}




###注意：对于`elasticsearch`来说，所有的条件都是可插拔的，彼此之间用`,`分割
GET lqz/_doc/_search
{
  "query": {
    "match_all": {}
  }, 
  "from": 2,
  "size": 2
}

4 布尔查询

- must（and）
- should（or）
- must_not（not）

##布尔查询之must and条件
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "name": "顾老二"
          }
        }
      ]
    }
  }
}


##布尔查询之should or条件
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "name": "龙套偏房"
          }
        }
      ]
    }
  }
}





### must_not条件   都不是
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must_not": [
        {
          "match": {
            "from": "gu"
          }
        },
        {
          "match": {
            "tags": "可爱"
          }
        },
        {
          "match": {
            "age": 18
          }
        }
      ]
    }
  }
}




###filter，大于小于的条件   gt lt  gte  lte
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "lt": 30
          }
        }
      }
    }
  }
}


### 范围查询
GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "from": "gu"
          }
        }
      ],
      "filter": {
        "range": {
          "age": {
            "gte": 25,
            "lte": 30
          }
        }
      }
    }
  }
}


### filter需要在bool内部，并且如果是and条件，需要用must，如果使用了should，会认为是should和filter是或者的关系

must：与关系，相当于关系型数据库中的and。
should：或关系，相当于关系型数据库中的or。
must_not：非关系，相当于关系型数据库中的not。
filter：过滤条件。
range：条件筛选范围。
gt：大于，相当于关系型数据库中的>。
gte：大于等于，相当于关系型数据库中的>=。
lt：小于，相当于关系型数据库中的<。
lte：小于等于，相当于关系型数据库中的<=。

5 查询结果过滤


###基本使用
GET lqz/_doc/_search
{
  "query": {
    "match_all": {
      }
  },
  "_source":["name","age"]
}


####_source和query是平级的

GET lqz/_doc/_search
{
  "query": {
    "bool": {
      "must":{
        "match":{"from":"gu"}
      },
      
      "filter": {
        "range": {
          "age": {
            "lte": 25
          }
        }
      }
    }
  },
  "_source":["name","age"]
}

6 高亮查询

GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "highlight": {
    "pre_tags": "<b class='key' style='color:red'>",
    "post_tags": "</b>",
    "fields": {
      "from": {}
    }
  }
}

7 聚合函数


# sum ,avg, max ,min

# select max(age) as my_avg from 表 where from=gu;
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_avg": {
      "avg": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

#最大年龄
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_max": {
      "max": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

#最小年龄
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_min": {
      "min": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}

# 总年龄
#最小年龄
GET lqz/_doc/_search
{
  "query": {
    "match": {
      "from": "gu"
    }
  },
  "aggs": {
    "my_sum": {
      "sum": {
        "field": "age"
      }
    }
  },
  "_source": ["name", "age"]
}



#分组


# 现在我想要查询所有人的年龄段，并且按照`15~20，20~25,25~30`分组，并且算出每组的平均年龄。
GET lqz/_doc/_search
{
  "size": 0, 
  "query": {
    "match_all": {}
  },
  "aggs": {
    "age_group": {
      "range": {
        "field": "age",
        "ranges": [
          {
            "from": 15,
            "to": 20
          },
          {
            "from": 20,
            "to": 25
          },
          {
            "from": 25,
            "to": 30
          }
        ]
      }
    }
  }
}

二 ik分词器使用

#1 github下载相应版本
https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v7.5.2
# 2 解压到es的plugin目录下
# 3 重启es



# ik_max_word 和 ik_smart 什么区别?

ik_max_word: 会将文本做最细粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,中华人民,中华,华人,人民共和国,人民,人,民,共和国,共和,和,国国,国歌”，会穷尽各种可能的组合，适合 Term Query；

ik_smart: 会做最粗粒度的拆分，比如会将“中华人民共和国国歌”拆分为“中华人民共和国,国歌”，适合 Phrase 查询。


PUT books
{
  "mappings": {
    "properties":{
      "title":{
        "type":"text",
        "analyzer": "ik_max_word"
      },
      "price":{
        "type":"integer"
      },
      "addr":{
        "type":"keyword"
      },
      "company":{
        "properties":{
          "name":{"type":"text"},
          "company_addr":{"type":"text"},
          "employee_count":{"type":"integer"}
        }
      },
      "publish_date":{"type":"date","format":"yyy-MM-dd"}
      
    }
    
  }
}

PUT books/_doc/1
{
  "title":"大头儿子小偷爸爸",
  "price":100,  
  "addr":"北京天安门",
  "company":{
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  },
  "publish_date":"2019-08-19"
}



PUT books/_doc/2
{
  "title":"白雪公主和十个小矮人",
  "price":"99",
  "addr":"黑暗森里",
  "company":{
    "name":"我的家乡在上海",
    "company_addr":"朋友一生一起走",
    "employee_count":10
  },
  "publish_date":"2018-05-19"
}

GET books/_mapping



GET _analyze
{
  "analyzer": "ik_max_word",
  "text": "白雪公主和十个小矮人"
}
GET books/_search
{
  "query": {
    "match": {
      "title": "十"
    }
  }
}



PUT books2
{
  "mappings": {
    "properties":{
      "title":{
        "type":"text",
        "analyzer": "ik_smart"
      },
      "price":{
        "type":"integer"
      },
      "addr":{
        "type":"keyword"
      },
      "company":{
        "properties":{
          "name":{"type":"text"},
          "company_addr":{"type":"text"},
          "employee_count":{"type":"integer"}
        }
      },
      "publish_date":{"type":"date","format":"yyy-MM-dd"}
      
    }
    
  }
}


PUT books2/_doc/1
{
  "title":"大头儿子小偷爸爸",
  "price":100,  
  "addr":"北京天安门",
  "company":{
    "name":"我爱北京天安门",
    "company_addr":"我的家在东北松花江傻姑娘",
    "employee_count":10
  },
  "publish_date":"2019-08-19"
}



PUT books2/_doc/2
{
  "title":"白雪公主和十个小矮人",
  "price":"99",
  "addr":"黑暗森里",
  "company":{
    "name":"我的家乡在上海",
    "company_addr":"朋友一生一起走",
    "employee_count":10
  },
  "publish_date":"2018-05-19"
}


GET _analyze
{
  "analyzer": "ik_smart",
  "text": "白雪公主和十个小矮人"
}
GET books2/_search
{
  "query": {
    "match": {
      "title": "十个"
    }
  }
}

二 Python中集成es两种方式

1 原生集成

# Official low-level client for Elasticsearch

### 等同于pymysql
#pip3 install elasticsearch


from elasticsearch import Elasticsearch

obj = Elasticsearch()   # 得到一个对象
# 创建索引（Index）
# result = obj.indices.create(index='user', body={"userid":'1','username':'lqz'},ignore=400)
# print(result)
# 删除索引
# result = obj.indices.delete(index='user', ignore=[400, 404])
# 插入数据
# data = {'userid': '1', 'username': 'lqz','password':'123'}
# result = obj.create(index='news', doc_type='_doc', id=1, body=data)
# print(result)
# 更新数据
'''
不用doc包裹会报错
ActionRequestValidationException[Validation Failed: 1: script or doc is missing
'''
# data ={'doc':{'userid': '1', 'username': 'lqz','password':'123ee','test':'test'}}
# result = obj.update(index='news', doc_type='_doc', body=data, id=1)
# print(result)


# 删除数据
# result = obj.delete(index='news', doc_type='_doc', id=1)
# print(result)

# 查询
# 查找所有文档
# query = {'query': {'match_all': {}}}
#  查找名字叫做jack的所有文档
query = {'query': {'match': {'title': '十个'}}}

# 查找年龄大于11的所有文档
# query = {'query': {'range': {'age': {'gt': 11}}}}

allDoc = obj.search(index='books', doc_type='_doc', body=query)
# print(allDoc)
print(allDoc['hits']['hits'][0]['_source'])

2 dsl集成

# Elasticsearch DSL is a high-level

# pip3 install elasticsearch-dsl



from datetime import datetime
from elasticsearch_dsl import Document, Date, Nested, Boolean,analyzer, InnerDoc, Completion, Keyword, Text,Integer

from elasticsearch_dsl.connections import connections

connections.create_connection(hosts=["localhost"])


class Article(Document):
    title = Text(analyzer='ik_max_word')
    author = Text()

    class Index:
        name = 'myindex'

    def save(self, ** kwargs):
        return super(Article, self).save(** kwargs)


if __name__ == '__main__':
    # Article.init()  # 创建索引
    # 保存数据
    # article = Article()
    # article.title = "测试测试阿斯顿发送到发斯蒂芬啊啊士大夫阿斯蒂芬"
    # article.author = "lqz"
    # article.save()  # 数据就保存了

    #查询数据
    # s=Article.search()
    # s = s.filter('match', title="测试")
    #
    # results = s.execute()  # 执行
    # print(results[0].title)

    #删除数据
    s = Article.search()
    s = s.filter('match', title="李清照").delete()

    #修改数据
    # s = Article().search()
    # s = s.filter('match', title="测试")
    # results = s.execute()
    # print(results[0])
    # results[0].title="李清照阿斯顿发送到发送阿斯蒂"
    # results[0].save()

三集群搭建（脑裂）

# 1 广播方式（一般不用）
	-只要es节点能联通，ping，自动加人到节点中
    
# 2 单播方式



#1 elasticsearch1节点，,集群名称是my_es1,集群端口是9300；节点名称是node1，监听本地9200端口，可以有权限成为主节点和读写磁盘（不写就是默认的）。

cluster.name: my_es1
node.name: node1
network.host: 127.0.0.1
http.port: 9200
transport.tcp.port: 9300
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]

# 2 elasticsearch2节点,集群名称是my_es1,集群端口是9302；节点名称是node2，监听本地9202端口，可以有权限成为主节点和读写磁盘。

cluster.name: my_es1
node.name: node2
network.host: 127.0.0.1
http.port: 9202
transport.tcp.port: 9302
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]

# 3 elasticsearch3节点，集群名称是my_es1,集群端口是9303；节点名称是node3，监听本地9203端口，可以有权限成为主节点和读写磁盘。

cluster.name: my_es1
node.name: node3
network.host: 127.0.0.1
http.port: 9203
transport.tcp.port: 9303
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]

# 4 elasticsearch4节点，集群名称是my_es1,集群端口是9304；节点名称是node4，监听本地9204端口，仅能读写磁盘而不能被选举为主节点。

cluster.name: my_es1
node.name: node4
network.host: 127.0.0.1
http.port: 9204
transport.tcp.port: 9304
node.master: false
node.data: true
discovery.zen.ping.unicast.hosts: ["127.0.0.1:9300", "127.0.0.1:9302", "127.0.0.1:9303", "127.0.0.1:9304"]

由上例的配置可以看到，各节点有一个共同的名字my_es1,但由于是本地环境，所以各节点的名字不能一致，我们分别启动它们，它们通过单播列表相互介绍，发现彼此，然后组成一个my_es1集群。谁是老大则是要看谁先启动了！



#3 假设有7个节点
	-由于网络问题  3个节点一组 ， 4 个节点一组形成了两个机器
    -防止脑列
    	防止脑裂，我们对最小集群节点数该集群设置参数：（集群节点总数/2+1的个数）
		discovery.zen.minimum_master_nodes: 3   # 3=5/2+1