merge into 语句同sequence结合引发 enq: SQ - contention 性能问题-CFANZ编程社区

适用范围

Oracle 11g版本、Linux平台

问题概述

某客户因为如下一条语句发生大量的 enq: SQ - contention等待事件，引发了生产事件：

merge into t_tar using (select count(*) cnt from t_tar where name = :1 ) G on (G.cnt > 0) 
when not matched then insert (id,name,type) values (tseq.nextval,:1,'S');

DBA在对该问题分析过程如下：
1）通过分析该SQL历史执行状态，判断该SQL已经在多天前就发现了sequence的争用；
2）对出现问题的当天检查该SQL的并发频率，并发频率在20次-100次/10s,2-10次/s,并发并不算高
3）检查该SQL相关统计信息，tseq cache初始为20，后调整到500说该等待事件恢复正常，检查表数据量大小 13.4w;
4）检查该SQL执行计划，发现 t_tar为全表扫描
5) 检查该SQL历史执行情况：通过awr快照分析，该SQL出现问题的时间每半小时执行了1000次以上，平时每半小时执行一般不超过600

初步分析问题：
发生问题当天该SQL执行频率确实增加，但即使最高峰也才1000多次/半小时，但每小时两千次的调用引起这么大的sequence争用显然不科学，说明这是问题的原因但不是根本原因。

后继续把问题回到SQL本身来，发现一个大的问题：
该SQL的语义其实很简单，就是检查目标表中是否已经存在新的name值，如存在则不插入，不存在则插入，并且给该值通过序列生成一个新的主键。

merge into t_tar using (select count(*) cnt from t_tar where name = :1 ) G on (G.cnt > 0) 
when not matched then insert (id,name,type) values (tseq.nextval,:1,'S');

测试中发现：当准备插入的值已经存在时发现使用的sequence发生了大量的跳号，跳号的数量为 t_tar的数量。
按该表当时行数 13W行 , 意味着 一次插入失败（新name值已经存在）则会让该sequence新生成13W个值，按当时出现问题并发每秒10次计算，假设这10次都是失败的，那么该sequence每秒要生成130W个值。
很显然这才是导致问题的根本，下面来分析，为啥会出现该情况，会出现该情况根源有哪些。

问题原因

下面通过以下几个问题的严重测试来解释问题产生的真正原因：

问题1：什么时候会发生跳号，跳号的数量会是多少

结论：当有匹配matched的行数后则会发生跳号，跳号的数量为matched生成的行数

序列增加的数量为：目标表生成并匹配的行数 + 源表 not matched的行数，not matched插入的行数为源表not mached的行数

问题分析：
根据测试表，执行如下语句，计算执行后序列增加数量，序列当前值，merge数量、插入数量、插入值。
(如下id<100为匹配的这是关联条件，50表示需要更新的这是执行条件)
merge into t_tar using t_org on (t_org.id<100 and t_org.id=t_tar.id) 
when matched then update set name='Matched' where id<=50
when not matched then insert (id,name,type) values(tseq.nextval,'TTT'||tseq.currval,'V');

t_tar ：目标表 t_tar能匹配条件的行数，看目标表
匹配行数 71行
HR@orcl>select count(*) from t_tar where id<100 and exists (select 1 from t_org where t_tar.id=t_org.id);

COUNT(*)
----------
        71

t_org ：看源表，源表未匹配的行数
未匹配行数，总行数减去匹配行数： 87-71=16行

总行数：
HR@orcl>select count(*) from t_org;

COUNT(*)

----------

匹配行数

HR@orcl>select count(*) from t_org where t_org.id<100 and  exists (select 1 from t_tar where t_tar.id=t_org.id);

COUNT(*)
----------
        71

序列当前值：101
HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       101

预测：
序列值为：起始值 + 目标表 t_tar表匹配生成行数（可能需更新，序列会被占用）+ 源表未匹配行数（需插入，需生成新的序列）
如此理解这里merge的原意就是对于目标表检查目标表中是否存在，存在则可执行对目标表更新操作
如果目标表不存在则从根据源表进行插入数据。
所以这里被占用的序列值也可以理解：目标生成行数中匹配到的行数（因为可能需要更新）+ 源表结果集中未匹配到的行数（因为可能需要插入）
101+71+16=188
merge 数量
目标表匹配数量后更新值 37行
HR@orcl>select count(*) from t_tar where exists (select 1 from t_org where t_tar.id=t_org.id) and id<=50;

COUNT(*)
----------
37
源表未匹配数量插入 16行
总共37+16行 53行
汇总：
序列值变化为：101+71+16=188
插入的值为id为有规律，但具体怎么个规律还要验证。
进行测试：

merge into t_tar using t_org on (t_org.id<100 and t_org.id=t_tar.id)
2 when matched then update set name='Matched' where id<=50
3 when not matched then insert (id,name,type) values(tseq.nextval,'TTT'||tseq.currval,'V');

53 rows merged.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
188

HR@orcl>select * from t_tar where name like 'TTT%';

ID NAME TYPE
------- ------------------------------ ------------------------------
102 TTT102 V
111 TTT111 V
120 TTT120 V
121 TTT121 V
129 TTT129 V
135 TTT135 V
137 TTT137 V
146 TTT146 V
150 TTT150 V
154 TTT154 V
163 TTT163 V
165 TTT165 V
172 TTT172 V
179 TTT179 V
180 TTT180 V
188 TTT188 V

16 rows selected.

--将原数据回滚继续执行上述操作：
序列值变化为：188+71+16=275
进行测试：

HR@orcl>merge into t_tar using t_org on (t_org.id<100 and t_org.id=t_tar.id)
2 when matched then update set name='Matched' where id<=50
3 when not matched then insert (id,name,type) values(tseq.nextval,'TTT'||tseq.currval,'V');

53 rows merged.

HR@orcl>select * from t_tar where name like 'TTT%';

ID NAME TYPE
------- ------------------------------ ------------------------------
189 TTT189 V
198 TTT198 V
207 TTT207 V
208 TTT208 V
216 TTT216 V
222 TTT222 V
224 TTT224 V
233 TTT233 V
237 TTT237 V
241 TTT241 V
250 TTT250 V
252 TTT252 V
259 TTT259 V
266 TTT266 V
267 TTT267 V
275 TTT275 V

16 rows selected.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
275

问题2：发生问题的语句发生每次跳号时为什么是源表的行数，能插入时则不会跳号

merge into t_tar using (select count(*) cnt from t_tar where name = :1 ) G on (G.cnt > 0)
when not matched then insert (id,name,type) values (tseq.nextval,:1,'S');

解答该问题首先从merge出发（具体细节可查看官方文档）
merge 设计初衷是比对两张表（目标表 t_tar，源表 t_org）的数据匹配情况，且是通过连接来匹配的，因此可以看到执行计划中是有join操作的：
merge into 后部分（table\view）为目标表
using 后部分（table\viwe\subquery）为源表
on 部分是连接匹配条件和过滤条件
when matched 即匹配后则对目标表进行相应更新操作（包括update\delete）,
when not matched 即未匹配后则对目标表进行相应插入操作（insert）
注意：matche 和 not matched两个操作可以只选择使用一个，但merge语句对两种情况都是会判断的，如果只要一种情况（如 not matched），那么另一种情况(matched)相当于空跑。
再次分析该语句：
源表是对目标表进行聚合操作后的子查询（select count(*) cnt from t_tar where name = :1） g, 在这里定义为g表
on 条件只有对g表的过滤条件 (G.cnt > 0)，而没有连接条件，这个时候就可以发现第一个问题了，会发生笛卡尔集，不过好在任何时候 g表的行数都只有一行，因此连接的结果集的行数就是
目标表 t_tar的行数。
笛卡尔集的结果类似：
t_tar 表 g表
id1,name1,type1,g.cnt;
id2,name2,type2,g.cnt;
...
idn,namen,typen,g.cnt;

现在分析匹配和不匹配的情况
1）当新的name值已经在 t_tar表中存在时，那么 g.cnt>0成立，那么对于上述笛卡尔结果集而言，所有行都匹配（mached），没有不匹配的（not mached)
根据上述测试我们现在知道会发生结果序列值会增加生成行匹配的行数（t_tar表行数）+目标表（g表）未匹配的行数（0行） = t_tar表行数
2）当新的name值不在 t_tar表中存在时，那么 g.cnt=0 ,on 条件 g.cnt>0不成立，那么对于上述笛卡尔结果集而言，所有行都不匹配（ not mached），没有匹配的（mached)
根据上述测试我们现在知道会发生结果序列值会增加生成行匹配的行数（0行）+目标表（g表）未匹配的行数（1行） = 1行，该值生成后已经插入到t_tar表了，符合业务需要。
下面进行测试验证：
t_tar表84行，序列当前值 275, 先带入一个已经存在的值 I_COL2 进行测试，根据上述分析
执行结果应该是没有插入行，但seqeunce值增加了 t_tar表行数，84行涨到了359
HR@orcl>select tseq.currval from dual;

CURRVAL
----------
275
HR@orcl>select count(*) from t_tar;

COUNT(*)
----------
84
HR@orcl>select count(*) cnt from t_tar where name = 'I_COL2';

CNT
----------
1

merge into t_tar using (select count(*) cnt from t_tar where name = 'I_COL2' ) G on (G.cnt > 0)
2 when not matched then insert (id,name,type) values (tseq.nextval,'I_COL2','S');

0 rows merged. --0行

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
359 = 275+84

再次测试：t_tar表84行，序列当前值 359, 带入一个不存在的值 XHY_TEST1 进行测试，根据上述分析
执行结果应该是插入1行，seqeunce值增加了1 涨到了360
HR@orcl>merge into t_tar using (select count(*) cnt from t_tar where name = 'XHY_TEST1' ) G on (G.cnt > 0)
2 when not matched then insert (id,name,type) values (tseq.nextval,'XHY_TEST1','S');

1 row merged. --1行

HR@orcl>select count(*) from t_tar where name='XHY_TEST1';

COUNT(*)
----------
1
HR@orcl>select tseq.currval from dual;

CURRVAL
----------
360 =359+1

问题3：以上结果还不是最差，最差的结果是两个大的结果集发生笛卡尔集

现在想想随着t_tar表数据的不断增加，当达到百万、千万时，那么一次merge操作就可能耗尽一个序列的所有值，想想太可怕了。

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
360
HR@orcl>select count(*) cnt from t_tar;

CNT
----------
84
HR@orcl>select type,count(*) cnt from t_tar group by type;

TYPE CNT
------------------------------ ----------
EDITION 1
SEQUENCE 1
INDEX 48
TABLE 31
CLUSTER 3
假设需求是判断类型数量来匹配(虽然该业务看似不太合理)，那会发生什么样可怕的结果呢,
构建如下测试语句：
merge into t_tar using (select type,count(*) cnt from t_tar group by type ) G on (G.cnt > 1)
when not matched then insert (id,name,type) values (tseq.nextval,'NEWNAME'||tseq.currval,'NEWTYPE'||tseq.currval);
预测
新增序列数量为匹配的类型3行（INDEX,TABLE,CLUSTER）* t_tar 行数 84行= 252行 + 未匹配的行数 2行（EDITION,SEQUENCE）
即原序列将上涨到 360+252+2=614
插入的值为2个

merge into t_tar using (select type,count(*) cnt from t_tar group by type ) G on (G.cnt > 1)
2 when not matched then insert (id,name,type) values (tseq.nextval,'NEWNAME'||tseq.currval,'NEWTYPE'||tseq.currval);

2 rows merged.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
614

解决方案

从上面问题分析我们已经可以知道为什么会发生这个事件，而为啥持续这么久直到现在才发生：

随着插入的数据量不断增多，以后每次当存在已有值插入时都会导致消耗的序列值在不断增加（为源表行数），这个问题当数据量少时并不会是问题，但数据量一大问题久凸显了。
至于如何优化，还得从业务逻辑出发，再次看该语句业务逻辑，其实很简单就是判断t_tar是否已经存在要插入的name值，不存在才插入，存在则不插入。

merge into t_tar using (select count(*) cnt from t_tar where name = :1 ) G on (G.cnt > 0) 
when not matched then insert (id,name,type) values (tseq.nextval,:2,'S');

既然这么简单的业务逻辑，完全不要用merge也是可以的，如下就是使用普通的insert 语句来实现：

insert into t_tar select t_seq.nextval,:1,'VIEW' from dual where not exists (select 1 from t_tar where name=:2);

测试如下：可以看到插入不存在的值则seq值增长为1，再次插入重复的值，提示插入的行数为0，而seq值也是不会变的。

HR@orcl>select * from t_tar where name='XHY_TEST1';

no rows selected
HR@orcl>var x varchar2(32);
HR@orcl>exec :x:='XHY_TEST1';

PL/SQL procedure successfully completed.

HR@orcl>select tseq.nextval from dual;

NEXTVAL
----------
       615

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       615
--如下插入一行
HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

1 row created.
--序列增加为1
HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       616

HR@orcl>commit;

Commit complete.
--再次插入，值已存在，插入失败
HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

0 rows created.
--序列值未变
HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       616

HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

0 rows created.

HR@orcl>commit;

Commit complete.

HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

0 rows created.

HR@orcl>commit;

Commit complete.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       616

HR@orcl>exec :x:='XHY_TEST2';

PL/SQL procedure successfully completed.

HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

1 row created.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       617

HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

0 rows created.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       617

HR@orcl>rollback;

Rollback complete.

HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

1 row created.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       618

HR@orcl>commit;

Commit complete.

HR@orcl>insert into t_tar select tseq.nextval,:x,'VIEW' from dual where not exists (select 1 from t_tar where name=:x);

0 rows created.

HR@orcl>commit;

Commit complete.

HR@orcl>select tseq.currval from dual;

CURRVAL
----------
       618

该问题发生sequence争用的直接原因在于错误地使用 merge 语句结合sequence，而merge又缺乏连接条件生成了笛卡尔集，满足匹配条件而生成的行数特别的多，导致消耗大量的序列值而引发了seq争用等待事件。