postgres库发布订阅
发布端和订阅端,位于同一台主机,不同的端口,发布端此处为5432
[postgres@xiongcc ~]$ psql
psql (13.2)
Type "help" for help.
postgres=# show port;
port
------
5432
(1 row)
postgres=# create table t1(id int);
CREATE TABLE
订阅端,位于同一台主机,不同的端口,此处为5439
[postgres@xiongcc ~]$ psql -p 5439
psql (13.2)
Type "help" for help.
postgres=# show port;
port
------
5439
(1 row)
postgres=# create table t1(id int);
CREATE TABLE
创建发布
postgres=# create publication pub1 for table t1;
CREATE PUBLICATION
postgres=# select * from pg_publication;
oid | pubname | pubowner | puballtables | pubinsert | pubupdate | pubdelete | pubtruncate | pubviaroot
-------+---------+----------+--------------+-----------+-----------+-----------+-------------+------------
16432 | pub1 | 10 | f | t | t | t | t | f
(1 row)
postgres=# select * from pg_publication_tables;
pubname | schemaname | tablename
---------+------------+-----------
pub1 | public | t1
(1 row)
postgres=# select * from pg_replication_origin;
roident | roname
---------+--------
(0 rows)
创建订阅
postgres=# create subscription sub1 connection 'host=localhost port=5432' publication pub1;
NOTICE: created replication slot "sub1" on publisher
CREATE SUBSCRIPTION
postgres=# select * from pg_subscription;
oid | subdbid | subname | subowner | subenabled | subconninfo | subslotname | subsynccommit | subpublications
-------+---------+---------+----------+------------+--------------------------+-------------+---------------+-----------------
16424 | 13578 | sub1 | 10 | t | host=localhost port=5432 | sub1 | off | {pub1}
(1 row)
发布端
postgres=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
wal_status | safe_wal_size
-----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+-
-----------+---------------
sub1 | pgoutput | logical | 13578 | postgres | f | t | 8685 | | 537 | 0/40D83A0 | 0/40D83D8 |
reserved |
(1 row)
postgres=# insert into t1 values(1);
INSERT 0 1
订阅端查看数据
postgres=# select * from t1;
id
----
1
(1 row)
可以看到,发布订阅是正常的。
新建库发布订阅
新建一个库,进行同样的发布操作
postgres=# show port;
port
------
5432
(1 row)
postgres=# create database mydb;
CREATE DATABASE
postgres=# \c mydb
You are now connected to database "mydb" as user "postgres".
mydb=# create table test(id int);
CREATE TABLE
mydb=# create publication pub2 for table test;
CREATE PUBLICATION
mydb=# select * from pg_replication_slots where slot_name = 'sub2'; ---复制槽为active,让我以为发布订阅是正常的!请看下文 slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
wal_status | safe_wal_size
-----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+-
-----------+---------------
sub2 | pgoutput | logical | 13578 | postgres | f | t | 8705 | | 541 | 0/40ED5B0 | 0/40ED5E8 |
reserved |
(1 row)
mydb=# insert into test values(1);
INSERT 0 1
订阅端也创建一个库,可以看到数据未正常订阅过来,初始数据没有同步过来
postgres=# show port;
port
------
5439
(1 row)
postgres=# create database mydb;
CREATE DATABASE
postgres=# \c myd
FATAL: database "myd" does not exist
Previous connection kept
postgres=# \c mydb
You are now connected to database "mydb" as user "postgres".
mydb=# create table test(id int);
CREATE TABLE
mydb=# create subscription sub2 connection 'host=localhost port=5432' publication pub2;
NOTICE: created replication slot "sub2" on publisher
CREATE SUBSCRIPTION
mydb=# select * from test;
id
----
(0 rows)
切换回postgres库重新发布订阅
mydb=# \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# insert into t1 values(2);
INSERT 0 1
postgres=# select * from t1;
id
----
1
2
(2 rows)
订阅端
mydb=# \c postgres postgres
You are now connected to database "postgres" as user "postgres".
postgres=# select * from t1;
id
----
1
2
(2 rows)
订阅端,此时日志开始大量报错,提示提示新建库的发布pub2不存在
CONTEXT: slot ""sub2"", output plugin ""pgoutput"", in the change callback, associated LSN 0/40ED690",,,,,,,,,"","logical replication worker"
2021-09-06 10:23:34.533 CST,,,17027,,6134bc72.4283,9260,,2021-09-05 20:47:46 CST,,0,LOG,00000,"background worker ""logical replication worker"" (PID 8735) exited with exit code 1",,,,,,,,,"","postmaster"
2021-09-06 10:23:39.547 CST,,,8738,,61357bab.2222,1,,2021-09-06 10:23:39 CST,3/801,0,LOG,00000,"logical replication apply worker for subscription ""sub2"" has started",,,,,,,,,"","logical replication worker"
2021-09-06 10:23:39.552 CST,,,8738,,61357bab.2222,2,,2021-09-06 10:23:39 CST,3/0,0,ERROR,XX000,"could not receive data from WAL stream: ERROR: publication ""pub2"" does not exist
CONTEXT: slot ""sub2"", output plugin ""pgoutput"", in the change callback, associated LSN 0/40ED690",,,,,,,,,"","logical replication worker"
2021-09-06 10:23:39.552 CST,,,17027,,6134bc72.4283,9261,,2021-09-05 20:47:46 CST,,0,LOG,00000,"background worker ""logical replication worker"" (PID 8738) exited with exit code 1",,,,,,,,,"","postmaster"
2021-09-06 10:23:44.560 CST,,,8741,,61357bb0.2225,1,,2021-09-06 10:23:44 CST,3/805,0,LOG,00000,"logical replication apply worker for subscription ""sub2"" has started",,,,,,,,,"","logical replication worker"
2021-09-06 10:23:44.565 CST,,,8741,,61357bb0.2225,2,,2021-09-06 10:23:44 CST,3/0,0,ERROR,XX000,"could not receive data from WAL stream: ERROR: publication ""pub2"" does not exist
CONTEXT: slot ""sub2"", output plugin ""pgoutput"", in the change callback, associated LSN 0/40ED690",,,,,,,,,"","logical replication worker"
2021-09-06 10:23:44.566 CST,,,17027,,6134bc72.4283,9262,,2021-09-05 20:47:46 CST,,0,LOG,00000,"background worker ""logical replication worker"" (PID 8741) exited with exit code 1",,,,,,,,,"","postmaster"
2021-09-06 10:23:49.572 CST,,,8743,,61357bb5.2227,1,,2021-09-06 10:23:49 CST,3/809,0,LOG,00000,"logical replication apply worker for subscription ""sub2"" has started",,,,,,,,,"","logical replication worker"
2021-09-06 10:23:49.577 CST,,,8743,,61357bb5.2227,2,,2021-09-06 10:23:49 CST,3/0,0,ERROR,XX000,"could not receive data from WAL stream: ERROR: publication ""pub2"" does not exist
CONTEXT: slot ""sub2"", output plugin ""pgoutput"", in the change callback, associated LSN 0/40ED690",,,,,,,,,"","logical replication worker"
2021-09-06 10:23:49.578 CST,,,17027,,6134bc72.4283,9263,,2021-09-05 20:47:46 CST,,0,LOG,00000,"background worker ""logical replication worker"" (PID 8743) exited with exit code 1",,,,,,,,,"","postmaster"
发布端也同样报错,提示新建库的发布pub2不存在
2021-09-06 10:24:14.631 CST [8756] ERROR: publication "pub2" does not exist
2021-09-06 10:24:14.631 CST [8756] CONTEXT: slot "sub2", output plugin "pgoutput", in the change callback, associated LSN 0/40ED690
2021-09-06 10:24:19.640 CST [8758] LOG: duration: 0.689 ms statement: SELECT pg_catalog.set_config('search_path', '', false);
2021-09-06 10:24:19.640 CST [8758] LOG: starting logical decoding for slot "sub2"
2021-09-06 10:24:19.640 CST [8758] DETAIL: Streaming transactions committing after 0/40ED690, reading WAL from 0/40ED658.
2021-09-06 10:24:19.641 CST [8758] LOG: logical decoding found consistent point at 0/40ED658
2021-09-06 10:24:19.641 CST [8758] DETAIL: Logical decoding will begin using saved snapshot.
2021-09-06 10:24:19.642 CST [8758] ERROR: publication "pub2" does not exist
2021-09-06 10:24:19.642 CST [8758] CONTEXT: slot "sub2", output plugin "pgoutput", in the change callback, associated LSN 0/40ED690
检查
后来经过朋友的提醒,发现是没有指定库名,默认延用了postgres这个库。
于是重新来过,发布端删除发布,重新创建:
postgres=# \c mydb
You are now connected to database "mydb" as user "postgres".
mydb=# drop publication pub2 ;
DROP PUBLICATION
mydb=# create publication pub2 for table test;
CREATE PUBLICATION
mydb=# \d
List of relations
Schema | Name | Type | Owner
--------+------+-------+----------
public | test | table | postgres
(1 row)
mydb=# insert into test values(2);
INSERT 0 1
订阅端,需要指定库的名字为mydb!
mydb=# drop subscription sub2 ;
NOTICE: dropped replication slot "sub2" on publisher
DROP SUBSCRIPTION
mydb=# create subscription sub2 connection 'host=localhost port=5432 dbname=mydb' publication pub2;
NOTICE: created replication slot "sub2" on publisher
CREATE SUBSCRIPTION
mydb=# select * from test;
id
----
1
(1 row)
mydb=# select * from test;
id
----
1
2
(2 rows)
mydb=# select * from pg_replication_slots ;
slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn |
wal_status | safe_wal_size
-----------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+-
-----------+---------------
sub1 | pgoutput | logical | 13578 | postgres | f | t | 8685 | | 546 | 0/40F3DA8 | 0/40F3DE0 |
reserved |
sub2 | pgoutput | logical | 16434 | mydb | f | t | 8858 | | 546 | 0/40F3DA8 | 0/40F3DE0 |
reserved |
(2 rows)
小结
这个是不经意间发现的,问题如下:
- 在新建的库内创建订阅的时候没有指定dbname,于是默认使用了postgres库
- 新建的库内创建了订阅之后,在发布端查看复制槽状态,显示的状态是 active= true的,让粗心的我以为发布订阅是正常的
- 最主要的问题在于,只有切换回postgres重新发布的时候,才检测到,另外一个订阅是不存在的(原因如上,库名不对),假如不切换回去,日志里也没有什么提示信息,也就一直不会发现发布订阅数据库对不上的这个问题,并且最开始创建的时候,复制槽状态显示为active,很容易误导我这种粗心懒人。
- 类似的还有,先在订阅端创建一个订阅,create subscription sub3 connection 'host=localhost port=5432 dbname=mydb' publication pub3;,发布端也可以看到sub3这个复制槽的状态是active的!要知道此时pub3发布都还不存在
这次的问题困扰了我一个晚上,昨晚就一直在检查是不是某些插件、某些参数亦或是编译项导致的问题,比如pg_show_plans插件,就会和逻辑复制槽冲突
postgres=# SELECT pg_create_logical_replication_slot('myslot', 'pgoutput');
ERROR: cannot create logical replication slot in transaction that has performed writes
postgres=# SELECT pg_create_logical_replication_slot('myslot', 'wal2json');
ERROR: cannot create logical replication slot in transaction that has performed writes
postgres=# SELECT pg_create_logical_replication_slot('myslot', 'wal2mongo');
ERROR: cannot create logical replication slot in transaction that has performed writes
postgres=# set pg_show_plans.enable = off;
SET
postgres=# SELECT pg_create_logical_replication_slot('myslot', 'pgoutput');
pg_create_logical_replication_slot
------------------------------------
(myslot,0/1572F90)
(1 row)
postgres=# SET pg_show_plans.enable = on;
SET
希望官方能优化一下这个提示信息,不过已经有类似的commit了,https://commitfest.postgresql.org/34/2957/,这个patch会检测第四种情况,发布是否存在,但是对于复制槽这个没看到,希望也尽快修复吧。