前言
我们知道,对于普通表转化为分区表,是比较麻烦的,pg_pathman倒是提供了函数可以在线非阻塞式搬迁数据,但是pg_pathman的问题比较多,bug多,并且随着14的release也不会再更新了,好在最近cybertech又出了一个新的插件,pg_rewrite,可以高效将普通表转化为分区表。
安装
地址在 https://github.com/cybertec-postgresql/pg_rewrite
安装很简单
[postgres@xiongcc ~]$ git clone https://github.com/cybertec-postgresql/pg_rewrite.git
Cloning into 'pg_rewrite'...
remote: Enumerating objects: 22, done.
remote: Counting objects: 100% (22/22), done.
remote: Compressing objects: 100% (17/17), done.
remote: Total 22 (delta 3), reused 22 (delta 3), pack-reused 0
Unpacking objects: 100% (22/22), done.
[postgres@xiongcc ~]$ cd pg_rewrite/
[postgres@xiongcc pg_rewrite]$ make
gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O0 -ggdb -g3 -fPIC -I. -I./ -I/usr/pgsql-14/include/server -I/usr/pgsql-14/include/internal -D_GNU_SOURCE -c -o pg_rewrite.o pg_rewrite.c
gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O0 -ggdb -g3 -fPIC -I. -I./ -I/usr/pgsql-14/include/server -I/usr/pgsql-14/include/internal -D_GNU_SOURCE -c -o concurrent.o concurrent.c
gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -g -O0 -ggdb -g3 -fPIC -shared -o pg_rewrite.so pg_rewrite.o concurrent.o -L/usr/pgsql-14/lib -Wl,--as-needed -Wl,-rpath,'/usr/pgsql-14/lib',--enable-new-dtags
[postgres@xiongcc pg_rewrite]$ make install
/bin/mkdir -p '/usr/pgsql-14/lib'
/bin/mkdir -p '/usr/pgsql-14/share/extension'
/bin/mkdir -p '/usr/pgsql-14/share/extension'
/bin/install -c -m 755 pg_rewrite.so '/usr/pgsql-14/lib/pg_rewrite.so'
/bin/install -c -m 644 .//pg_rewrite.control '/usr/pgsql-14/share/extension/'
/bin/install -c -m 644 .//pg_rewrite--1.0.sql '/usr/pgsql-14/share/extension/'
pg_rewrite提供了一个函数用于转化分区表
postgres=# create extension pg_rewrite ;
CREATE EXTENSION
postgres=# \df *partition_table*
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+-----------------+------------------+----------------------------------------------------+------
public | partition_table | void | src_table text, dst_table text, src_table_new text | func
(1 row)
总共有三个配置参数
postgres=# select setting,name from pg_settings where name like '%rewrite%';
setting | name
---------+---------------------------
on | rewrite.check_constraints
0 | rewrite.max_xlock_time
0 | rewrite.wait_after_load
(3 rows)
•rewrite.check_constraints:在开始复制数据之前,会检查目标表是否有与源表相同的约束,如果发现有差异,会抛出一个ERROR。问题是,由于目标表的约束条件(意外地)缺失,一旦处理完成,违反源表约束条件的数据就会被允许出现在目标表中。即使在目标表上有一个额外的约束也是一个问题,因为pg_rewrite只假设它复制的所有数据都满足源表的约束,但是它并没有根据目标表的额外约束来验证它们。通过设置rewrite.check_constraints为false,用户可以关闭约束条件检查。在这样做之前,请务必小心谨慎。•rewrite.max_xlock_time:尽管被处理的表在大部分时间内可以被其他事务进行读和写操作,但需要一个独占锁来最终完成处理(即处理剩余的并发变化和重命名表)。如果扩展函数似乎对表的访问阻塞太多,可以考虑设置 "rewrite.max_xlock_time "GUC参数。比如说 SET rewrite.max_xlock_time TO 100;
用于告诉持有Exclusive 独占锁时间不应该超过0.1秒(100毫秒)。如果最后阶段需要更多的时间,特定的函数会释放独占锁,处理其他事务在中间提交的修改,并再次尝试最后阶段。如果再超过几次锁的持续时间,就会报告错误。如果发生这种情况,你应该增加设置,或者在写活动较少时,尝试处理有问题的表。默认值是0,意味着最后阶段需要多少时间都可以。
还有一个参数,在官方文档里没看到介绍,只有翻一下源码,因为是非阻塞式搬迁数据,允许搬迁过程中进行DML,这个参数控制初始加载完成后,在开始解码其他事务的数据变化之前,需要等待的时间。
/*
* Time (in seconds) to wait after the initial load has completed and before
* we start decoding of data changes introduced by other transactions. This
* helps to ensure defined order of steps when we test processing of the
* concurrent changes.
*/
int rewrite_wait_after_load = 0;
/*
* During regression tests, wait until the other transactions performed
* their data changes so that we can process them.
*
* Since this should only be used for tests, don't bother using
* WaitLatch().
*/
if (rewrite_wait_after_load > 0)
{
LOCKTAG tag;
Oid extension_id;
LockAcquireResult lock_res PG_USED_FOR_ASSERTS_ONLY;
extension_id = get_extension_oid("pg_rewrite", false);
SET_LOCKTAG_OBJECT(tag, MyDatabaseId, ExtensionRelationId,
extension_id, 0);
lock_res = LockAcquire(&tag, ExclusiveLock, false, false);
Assert(lock_res == LOCKACQUIRE_OK);
/*
* Misuse lock on our extension to let the concurrent backend(s) check
* that we're exactly here.
*/
pg_usleep(rewrite_wait_after_load * 1000000L);
LockRelease(&tag, ExclusiveLock, false);
}
实操
先新建一个普通表
postgres=# CREATE TABLE t1 (
id int not null,
tm timestamptz not null
);
CREATE TABLE
postgres=# insert into t1 select extract(epoch from seq), seq from generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;
INSERT 0 1313280
再新建一个要转化后的分区表
postgres=# CREATE TABLE ptab01 (
id int not null,
tm timestamptz not null
) PARTITION BY RANGE (tm);
CREATE TABLE
postgres=# create table ptab01_202001 partition of ptab01 for values from ('2020-01-01') to ('2020-02-01');
CREATE TABLE
postgres=# create table ptab01_202002 partition of ptab01 for values from ('2020-02-01') to ('2020-03-01');
CREATE TABLE
postgres=# create table ptab01_202003 partition of ptab01 for values from ('2020-03-01') to ('2020-04-01');
CREATE TABLE
postgres=# create table ptab01_202004 partition of ptab01 for values from ('2020-04-01') to ('2020-05-01');
CREATE TABLE
postgres=# create table ptab01_202005 partition of ptab01 for values from ('2020-05-01') to ('2020-06-01');
CREATE TABLE
postgres=# \d
List of relations
Schema | Name | Type | Owner
--------+---------------+-------------------+----------
public | ptab01 | partitioned table | postgres
public | ptab01_202001 | table | postgres
public | ptab01_202002 | table | postgres
public | ptab01_202003 | table | postgres
public | ptab01_202004 | table | postgres
public | ptab01_202005 | table | postgres
public | t1 | table | postgres
(7 rows)
然后使用pg_rewrite提供的函数转化为分区表
postgres=# \df *partition_table*
List of functions
Schema | Name | Result data type | Argument data types | Type
--------+-----------------+------------------+----------------------------------------------------+------
public | partition_table | void | src_table text, dst_table text, src_table_new text | func
(1 row)
postgres=# select * from partition_table('t1','ptab01','t1_old');
ERROR: Table "t1" has no identity index
1.第一个参数是源表,也就是需要改造成分区表的表,此处是t12.第二个参数是目标表,也就是需要改造后的分区表,此处是ptab013.第三个参数是源表重命名后的表,此处是t1_old,所以最大需要两倍的磁盘空间
此处报错,可以看到需要身份标识identity,此处我们就直接加个主键,身份标识总共有四种策略
•默认模式 (default):非系统表采用的默认模式,如果有主键,则用主键列作为身份标识•索引模式 (index):将某一个符合条件的索引中的列,用作身份标识•完整模式 (full):将整行记录中的所有列作为复制标识,仅能作为兜底方案•无身份模式 (nothing):不记录任何复制标识,这意味着update和delete操作无法复制到订阅者上。
postgres=# alter table t1 add primary key(tm);
ALTER TABLE
postgres=# select * from partition_table('t1','ptab01','t1_old');
ERROR: logical decoding requires wal_level >= logical
可以看到,还需要logical的级别,更改了之后,再来看下效果
postgres=# select * from partition_table('t1','ptab01','t1_old');
ERROR: the source and destination relations have different primary key
又提示主键不一致,check_constraints参数的原因,发现约束有差异,会抛出一个ERROR
postgres=# alter table ptab01 add primary key(tm);
ALTER TABLE
大概花了6s,130W行数据,转化速度也还算快
postgres=# select * from partition_table('t1','ptab01','t1_old');
partition_table
-----------------
(1 row)
2021-12-11 21:22:39.697 CST [18011] LOG: logical decoding found consistent point at 0/F7A57AF8
2021-12-11 21:22:39.697 CST [18011] DETAIL: There are no running transactions.
2021-12-11 21:22:39.697 CST [18011] STATEMENT: select * from partition_table('t1','ptab01','t1_old');
2021-12-11 21:22:46.626 CST [18011] LOG: duration: 6936.349 ms statement: select * from partition_table('t1','ptab01','t1_old');
看下转化后的效果
postgres=# select * from partition_table('t1','ptab01','t1_old');
partition_table
-----------------
(1 row)
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+---------------+-------------------+----------+-------------+---------------+---------+-------------
public | ptab01_202001 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202002 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202003 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202004 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202005 | table | postgres | permanent | heap | 11 MB |
public | t1 | partitioned table | postgres | permanent | | 0 bytes |
public | t1_old | table | postgres | permanent | heap | 56 MB |
(7 rows)
postgres=# \d+ t1
Partitioned table "public.t1"
Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description
--------+--------------------------+-----------+----------+---------+---------+-------------+--------------+-------------
id | integer | | not null | | plain | | |
tm | timestamp with time zone | | not null | | plain | | |
Partition key: RANGE (tm)
Indexes:
"ptab01_pkey" PRIMARY KEY, btree (tm)
Partitions: ptab01_202001 FOR VALUES FROM ('2020-01-01 00:00:00+08') TO ('2020-02-01 00:00:00+08'),
ptab01_202002 FOR VALUES FROM ('2020-02-01 00:00:00+08') TO ('2020-03-01 00:00:00+08'),
ptab01_202003 FOR VALUES FROM ('2020-03-01 00:00:00+08') TO ('2020-04-01 00:00:00+08'),
ptab01_202004 FOR VALUES FROM ('2020-04-01 00:00:00+08') TO ('2020-05-01 00:00:00+08'),
ptab01_202005 FOR VALUES FROM ('2020-05-01 00:00:00+08') TO ('2020-06-01 00:00:00+08')
可以看到,原先的分区子表变成了t1表的子分区了,也如我们预期,t1表要转化成分区表,而原先的t1的表变成了t1_old表。
注意点1
因为最后需要rename,所以会涉及到一个很短暂的AccessExclusive锁,根据max_xlock_time参数来决定等待多久。
重新操作一遍,并且开启一个事务查询t1表
postgres=# begin;
BEGIN
postgres=*# select id from t1 limit 1;
id
------------
1577808000
(1 row)
postgres=*# select pg_backend_pid();
pg_backend_pid
----------------
18011
(1 row)
---未提交
另一个会话进行转化
postgres=# select * from partition_table('t1','ptab01','t1_old');
---阻塞
查看锁,可以看到,被查询阻塞了
postgres=# SELECT
blocked_locks.pid AS blocked_pid,
blocked_activity.usename AS blocked_user,
now() - blocked_activity.query_start AS blocked_duration,
blocking_locks.pid AS blocking_pid,
blocking_activity.usename AS blocking_user,
now() - blocking_activity.query_start AS blocking_duration,
blocked_activity.query AS blocked_statement,
blocking_activity.query AS blocking_statement
FROM
pg_catalog.pg_locks AS blocked_locks
JOIN pg_catalog.pg_stat_activity AS blocked_activity ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks AS blocking_locks ON blocking_locks.locktype = blocked_locks.locktype
AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE
AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page
AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple
AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid
AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid
AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid
AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid
AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid
AND blocking_locks.pid != blocked_locks.pid
JOIN pg_catalog.pg_stat_activity AS blocking_activity ON blocking_activity.pid = blocking_locks.pid
WHERE
NOT blocked_locks.granted;
-[ RECORD 1 ]------+-------------------------------------------------------
blocked_pid | 18009
blocked_user | postgres
blocked_duration | 00:00:29.163332
blocking_pid | 18011
blocking_user | postgres
blocking_duration | 00:00:36.395631
blocked_statement | select * from partition_table('t1','ptab01','t1_old');
blocking_statement | select id from t1 limit 1;
postgres=# select * from pg_locks where pid = 18009 and granted = 'f';
-[ RECORD 1 ]------+-----------------------------
locktype | relation
database | 13890
relation | 16847
page |
tuple |
virtualxid |
transactionid |
classid |
objid |
objsubid |
virtualtransaction | 3/24
pid | 18009
mode | AccessExclusiveLock
granted | f
fastpath | f
waitstart | 2021-12-11 21:39:17.37459+08
注意点2
其实就是逻辑订阅里面的限制:序列号数据不会被复制。因为pg_rewrite也是使用的logical replication,所以也得需要额外的配置。
wal_level = logical
max_replication_slots = 1
# ... or add 1 to the current value.
shared_preload_libraries = 'pg_rewrite'
# ... or add the library to the existing ones.
序列号所服务的标识列与SERIAL
类型里面的数据作为表的一部分当然会被复制,但序列号本身仍会在订阅者上保持为初始值。
postgres=# CREATE TABLE t1 (
id serial not null,
tm timestamptz not null primary key
);
CREATE TABLE
postgres=# insert into t1(tm) select generate_series('2020-01-01'::timestamptz, '2020-05-31 23:59:59'::timestamptz, interval '10 seconds') as seq;
INSERT 0 1313280
postgres=# CREATE TABLE ptab01 (
id serial not null,
tm timestamptz not null
) PARTITION BY RANGE (tm);
CREATE TABLE
postgres=# create table ptab01_202001 partition of ptab01 for values from ('2020-01-01') to ('2020-02-01');
CREATE TABLE
postgres=# create table ptab01_202002 partition of ptab01 for values from ('2020-02-01') to ('2020-03-01');
CREATE TABLE
postgres=# create table ptab01_202003 partition of ptab01 for values from ('2020-03-01') to ('2020-04-01');
CREATE TABLE
postgres=# create table ptab01_202004 partition of ptab01 for values from ('2020-04-01') to ('2020-05-01');
CREATE TABLE
postgres=# create table ptab01_202005 partition of ptab01 for values from ('2020-05-01') to ('2020-06-01');
CREATE TABLE
postgres=# alter table ptab01 add primary key(tm);
ALTER TABLE
postgres=# select max(id) from t1;
max
---------
1313280
(1 row)
再次进行转化
postgres=# select * from partition_table('t1','ptab01','t1_old');
partition_table
-----------------
(1 row)
postgres=# \d+
List of relations
Schema | Name | Type | Owner | Persistence | Access method | Size | Description
--------+---------------+-------------------+----------+-------------+---------------+------------+-------------
public | ptab01_202001 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202002 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202003 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202004 | table | postgres | permanent | heap | 11 MB |
public | ptab01_202005 | table | postgres | permanent | heap | 11 MB |
public | ptab01_id_seq | sequence | postgres | permanent | | 8192 bytes |
public | t1 | partitioned table | postgres | permanent | | 0 bytes |
public | t1_id_seq | sequence | postgres | permanent | | 8192 bytes |
public | t1_old | table | postgres | permanent | heap | 56 MB |
(9 rows)
插入一条数据
postgres=# insert into t1(tm) values ('2020-03-01 23:11:32');
INSERT 0 1
postgres=# select * from t1 where tm = '2020-03-01 23:11:32';
id | tm
----+------------------------
4 | 2020-03-01 23:11:32+08
(1 row)
postgres=# insert into t1(tm) values ('2020-03-01 23:12:32');
INSERT 0 1
postgres=# select * from t1 where tm = '2020-03-01 23:12:32';
id | tm
----+------------------------
5 | 2020-03-01 23:12:32+08
(1 row)
postgres=# insert into t1_old(tm) values ('2020-03-01 23:12:32');
INSERT 0 1
postgres=# select * from t1_old where tm = '2020-03-01 23:12:32';
id | tm
---------+------------------------
1313281 | 2020-03-01 23:12:32+08
(1 row)
可以看到,序列的起始值和原来的表不一样,假如是主键的话,可能业务会报错,这一点需要格外注意。
小结
pg_rewrite支持在线将普通表转换为分区表,仅仅在最后更换表名的时候需要获取一个排它锁,https://github.com/cybertec-postgresql/pg_rewrite
1.解决了从非分区表变更为分区表的长时间锁问题2.需要使用logical replication,从非分区表增量将数据复制到分区表3.只需要短暂的排他锁,在同步完数据后用于切换表名,这一点和pg_repack是一样的原理。4.非分区表一定要有主键,或者说是身份标识5.分区表建议约束和非分区表保持一致,例如check、not null、主键、default value 等约束,不然可能会使数据变得复杂。6.serial字段也记得要设置妥当,序列号本身仍会在订阅者上保持为初始值。
最近发现一个不错的技巧,假如我们想看源代码,一般都是克隆到本地,然后导入到IDE里面去看,此处分享一个不错的技巧,假如我们要看pg_rewrite的代码,仓库代码是,https://github.com/cybertec-postgresql/pg_rewrite,那么我们可以这么输入 https://github1s.com/cybertec-postgresql/pg_rewrite,即添加一个`1s`,这本质上一个 Web 版本的 VSCode,我觉得这比 Octotree 更好用。听名字就知道,它可以让你在 1 秒内(俗称 +1s)通过在线版本的 VS Code 来打开 GitHub 上的代码,只需要在对应项目的 URL 后面加上 1s 即可。
参考
https://github.com/cybertec-postgresql/pg_rewrite
https://www.cybertec-postgresql.com/en/pg_rewrite-postgresql-table-partitioning/
https://github.com/digoal/blog/blob/master/202112/20211209_01.md