今天接到开发同学需求,需要修改一批数据,SQL如下:
update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select b.id from settle_studio_income a,finance_income_flow b where
a.source_code = b.business_no
and a.studio_code = b.merchant_id
and a.income_time = b.income_time
and b.re_item_code= 'SZ2024'
and a.`source` = 4 and a.income_time = '2022-04-30 23:59:59' and b.income_time = '2022-04-30 23:59:59' and a.`status` =400
and b.settle_status = 200);
执行时候发现报错:
You can't specify target table 'finance_income_flow' for update in FROM clause
这个报错意思是要更新的表不能在where条件中用于子查询。
换一种方式修改,创建个临时表,然后把子查询的id全部insert进去,然后更新:
update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);
这个SQL只需要更新49310条数据,但是SQL执行了两分钟一直没有成功。
show processlist查看线程状态为preparing。
update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);
影响行数:[49310],耗时:201896 ms
几分钟后执行成功了,但是耗时三分半,明显不正常,接下来我们查找一下原因。
首先看一下这个SQL的执行计划。
explain update finance_income_flow set settle_batch_id = 2096,settle_status = 400,settle_time='2022-07-20 20:00:00',update_time= now()
where id in(select idd from t_bid_tmp);
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+
| 1 | UPDATE | finance_income_flow | | index | | PRIMARY | 4 | | 31431906 | 100 | Using where |
| 2 | DEPENDENT SUBQUERY | t_bid_tmp | | index_subquery | idx_idd | idx_idd | 5 | func | 1 | 100 | Using index |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+-----------------+
通过执行计划,我们看到三个关键信息:
1、表finance_income_flow扫描行数为31431906,全索引扫描;
2、表t_bid_tmp优先级比finance_income_flow高;
3、表t_bid_tmp的select_type为DEPENDENT SUBQUERY,意思是SUBQUERY要受到外部查询的影响。
这个就比较尴尬了,子查询受到外部查询的影响,但是外部查询是全索引扫描(扫描行数31431906),那这个SQL一定会很慢。
看执行计划,这个执行效果没有疑问,可是好像和正常理解的有偏差,难道不是应该先执行子查询,然后外部查询根据子查询结果进行update?
带着疑问,我们再看一个执行计划:
explain select count(*) from finance_income_flow where id in(select idd from t_bid_tmp);
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+
| 1 | SIMPLE | t_bid_tmp | | index | idx_idd | idx_idd | 5 | | 49664 | 98.8 | Using where; Using index; LooseScan |
| 1 | SIMPLE | finance_income_flow | | eq_ref | PRIMARY | PRIMARY | 4 | hlj.t_bid_tmp.idd | 1 | 100 | Using index |
+--------------+-----------------------+---------------------+----------------------+----------------+-------------------------+---------------+-------------------+-------------------+----------------+--------------------+-------------------------------------+
通过这个执行计划,我们看到三点信息:
1、表t_bid_tmp优先级比finance_income_flow高
2、两个表select_type都是SIMPLE
3、表finance_income_flow扫码行数为49664,type为eq_ref,效率很高。
通过对比两个执行计划,两个SQL在处理select和update时候逻辑完全不一样,效率也差别很大。
通过各方查找相关信息:MySQL update where+子查询方式更新,是将外部查询数据全部取出来,然后进行更新,这种原理决定了如果外部查询表特表大,那整体执行时间就会特别长。
对于类似这样SQL,有没有好的解决方法呢?当然有。
update finance_income_flow a, t_bid_tmp b
set a.settle_batch_id = 2096,a.settle_status = 400,a.settle_time='2022-07-20 20:00:00',a.update_time= now()
where a.id=b.idd;
影响行数:[49310],耗时:1377 ms.
explain update finance_income_flow a, t_bid_tmp b
set a.settle_batch_id = 2096,a.settle_status = 400,a.settle_time='2022-07-20 20:00:00',a.update_time= now()
where a.id=b.idd;
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+
| 1 | SIMPLE | b | | index | idx_idd | idx_idd | 5 | | 49664 | 100 | Using where; Using index |
| 1 | UPDATE | a | | eq_ref | PRIMARY | PRIMARY | 4 | hlj.b.idd | 1 | 100 | |
+--------------+-----------------------+-----------------+----------------------+----------------+-------------------------+---------------+-------------------+---------------+----------------+--------------------+--------------------------+
通过执行计划可以看到:这种更新方式效率最高,也最符合预期。