MySQL update where+子查询案例分析-CFANZ编程社区

今天接到开发同学需求，需要修改一批数据，SQL如下：

update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select b.id from settle_studio_income a,finance_income_flow b  where 
 a.source_code = b.business_no 
 and a.studio_code = b.merchant_id
 and a.income_time = b.income_time
 and b.re_item_code= 'SZ2024'
 and a.`source` = 4 and a.income_time = '2022-04-30 23:59:59' and b.income_time  = '2022-04-30 23:59:59'  and a.`status` =400
 and b.settle_status = 200);

执行时候发现报错：

You can't specify target table 'finance_income_flow' for update in FROM clause

这个报错意思是要更新的表不能在where条件中用于子查询。
换一种方式修改，创建个临时表，然后把子查询的id全部insert进去，然后更新：

update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);

这个SQL只需要更新49310条数据，但是SQL执行了两分钟一直没有成功。
show processlist查看线程状态为preparing。

update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);
影响行数：[49310]，耗时：201896 ms

几分钟后执行成功了,但是耗时三分半，明显不正常，接下来我们查找一下原因。

首先看一下这个SQL的执行计划。

explain update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+
| id           | select_type           | table               | partitions           | type           | possible_keys           | key           | key_len           | ref           | rows           | filtered           | Extra           |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+
| 1            | UPDATE                | finance_income_flow |                      | index          |                         | PRIMARY       | 4                 |               | 31431906       |                100 | Using where     |
| 2            | DEPENDENT SUBQUERY    | t_bid_tmp           |                      | index_subquery | idx_idd                 | idx_idd       | 5                 | func          | 1              |                100 | Using index     |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+

通过执行计划，我们看到三个关键信息：
1、表finance_income_flow扫描行数为31431906，全索引扫描；
2、表t_bid_tmp优先级比finance_income_flow高；
3、表t_bid_tmp的select_type为DEPENDENT SUBQUERY，意思是SUBQUERY要受到外部查询的影响。

这个就比较尴尬了，子查询受到外部查询的影响，但是外部查询是全索引扫描（扫描行数31431906），那这个SQL一定会很慢。
看执行计划，这个执行效果没有疑问，可是好像和正常理解的有偏差，难道不是应该先执行子查询，然后外部查询根据子查询结果进行update？

带着疑问，我们再看一个执行计划：

explain select count(*) from finance_income_flow where id in(select idd from t_bid_tmp);
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+
| id           | select_type           | table               | partitions           | type           | possible_keys           | key           | key_len           | ref               | rows           | filtered           | Extra                               |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+
| 1            | SIMPLE                | t_bid_tmp           |                      | index          | idx_idd                 | idx_idd       | 5                 |                   | 49664          |               98.8 | Using where; Using index; LooseScan |
| 1            | SIMPLE                | finance_income_flow |                      | eq_ref         | PRIMARY                 | PRIMARY       | 4                 | hlj.t_bid_tmp.idd | 1              |                100 | Using index                         |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+

通过这个执行计划，我们看到三点信息：
1、表t_bid_tmp优先级比finance_income_flow高
2、两个表select_type都是SIMPLE
3、表finance_income_flow扫码行数为49664，type为eq_ref，效率很高。

通过对比两个执行计划，两个SQL在处理select和update时候逻辑完全不一样，效率也差别很大。
通过各方查找相关信息：MySQL update where+子查询方式更新，是将外部查询数据全部取出来，然后进行更新，这种原理决定了如果外部查询表特表大，那整体执行时间就会特别长。

对于类似这样SQL，有没有好的解决方法呢？当然有。

update finance_income_flow a, t_bid_tmp b
set a.settle_batch_id = 2096,a.settle_status = 400,a.settle_time='2022-07-20 20:00:00',a.update_time= now()
where a.id=b.idd;
影响行数：[49310]，耗时：1377 ms.

explain update finance_income_flow a, t_bid_tmp b
set a.settle_batch_id = 2096,a.settle_status = 400,a.settle_time='2022-07-20 20:00:00',a.update_time= now()
where a.id=b.idd;
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+
| id           | select_type           | table           | partitions           | type           | possible_keys           | key           | key_len           | ref           | rows           | filtered           | Extra                    |
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+
| 1            | SIMPLE                | b               |                      | index          | idx_idd                 | idx_idd       | 5                 |               | 49664          |                100 | Using where; Using index |
| 1            | UPDATE                | a               |                      | eq_ref         | PRIMARY                 | PRIMARY       | 4                 | hlj.b.idd     | 1              |                100 |                          |
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+

通过执行计划可以看到：这种更新方式效率最高，也最符合预期。