问题1:在使用备份还原初始化备库的时候主库会有新的wal产生,这些新的wal是否会自动同步到备库?
- 主库当前的状态
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+-------------------------------------
3/C2FDE550 | 0000000900000003000000C2 | (0000000900000003000000C2,16639312)
当前walfile_name是C2,pg_wal目录中有B0~C2的wal,归档目录中有B0c1的所有wal。
- 备库状态
[postgres@vm001 vicdb_5432]$ pg_probackup show
BACKUP INSTANCE 'vicdb_5432'
=====================================================================================================================================
Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status
=====================================================================================================================================
vicdb_5432 14 RLHCXV 2022-11-17 15:03:34+08 FULL ARCHIVE 9/0 5s 407MB 16MB 1.00 3/B00000D8 3/B10000F0 OK
从pg_probackup show的结果看,恢复的起点在B0这个wal中,B0~B1这个WAL在备库的pg_wal目录中
- 启动数据库查看备库日志
INFO: pg_probackup archive-get copied WAL file 0000000900000003000000B1
INFO: pg_probackup archive-get completed successfully, fetched: 1/1, time elapsed: 8ms
2022-11-18 10:23:21.128 CST [26057] LOG: restored log file "0000000900000003000000B1" from archive
2022-11-18 10:23:21.170 CST [26057] LOG: consistent recovery state reached at 3/B10000F0
2022-11-18 10:23:21.170 CST [26055] LOG: database system is ready to accept read-only connections
INFO: pg_probackup archive-get WAL file: 0000000900000003000000B2, remote: none, threads: 1/1, batch: 1
ERROR: pg_probackup archive-get failed to deliver WAL file: 0000000900000003000000B2, time elapsed: 0ms
2022-11-18 10:23:21.189 CST [26065] LOG: started streaming WAL from primary at 3/B2000000 on timeline 9
2022-11-18 10:23:21.862 CST [26061] LOG: restartpoint starting: wal
2022-11-18 10:23:23.403 CST [26061] LOG: restartpoint complete: wrote 8 buffers (0.8%); 0 WAL file(s) added, 3 removed, 0 recycled; write=1.104 s, sync=0.375 s, total=1.542 s; sync files=34, longest=0.146 s, average=0.012 s; distance=49151 kB, estimate=49151 kB
2022-11-18 10:23:23.403 CST [26061] LOG: recovery restart point at 3/B3000028
2022-11-18 10:23:23.403 CST [26061] DETAIL: Last completed transaction was at log time 2022-11-17 15:46:55.083398+08.
2022-11-18 10:23:23.713 CST [26061] LOG: restartpoint starting: wal
2022-11-18 10:23:32.600 CST [26061] LOG: restartpoint complete: wrote 27 buffers (2.6%); 0 WAL file(s) added, 5 removed, 1 recycled; write=8.820 s, sync=0.036 s, total=8.888 s; sync files=17, longest=0.015 s, average=0.003 s; distance=98307 kB, estimate=98307 kB
2022-11-18 10:23:32.600 CST [26061] LOG: recovery restart point at 3/B9000F00
2022-11-18 10:23:32.600 CST [26061] DETAIL: Last completed transaction was at log time 2022-11-17 16:14:59.44912+08.
2022-11-18 10:23:32.600 CST [26061] LOG: restartpoint starting: wal
从2022-11-18 10:23:21.189 CST [26065] LOG: started streaming WAL from primary at 3/B2000000 on timeline 9 可以发现,在这种情况下备库直接使用备库自己pg_wal中的B0和B1将数据库恢复到了一致状态。因为B2这个归档在主库的pg_wal目录中发现,就直接启用了streaming流复制的方式获取数据,然后不断的做recover就达到了一致的状态。
答案:就算有缺少的wal,也会通过复制机制传输过来。
问题二:如果主库的pg_wal里面的wal已经被清理了或者已经被归档了呢
现在已经知道了wal会传输过来,那么是从主库的pg_wal里面拿对吗?如果主库的pg_wal里面的wal已经被清理了或者已经被归档了呢?开启一个新的测试
- 主库状态
postgres-# pg_walfile_name_offset(pg_current_wal_lsn());
pg_current_wal_lsn | pg_walfile_name | pg_walfile_name_offset
--------------------+--------------------------+------------------------------------
3/E71A2C88 | 0000000900000003000000E7 | (0000000900000003000000E7,1715336)
(1 row)
在主库的归档路径中有B3-E6的wal,在主库的pg_wal中有DD-E6的所有归档
- 备库状态
[postgres@vm001 backup]$ pg_probackup show
BACKUP INSTANCE 'vicdb_5432'
=====================================================================================================================================
Instance Version ID Recovery Time Mode WAL Mode TLI Time Data WAL Zratio Start LSN Stop LSN Status
=====================================================================================================================================
vicdb_5432 14 RLJ74A 2022-11-18 14:53:05+08 FULL ARCHIVE 9/0 9s 408MB 16MB 1.00 3/D2000028 3/D30000B8 OK
vicdb_5432 14 RLHCXV 2022-11-17 15:03:34+08 FULL ARCHIVE 9/0 5s 407MB 16MB 1.00 3/B00000D8 3/B10000F0 OK
备库的恢复起点是D2,D2和D3在备库的pg_wal中都有。在备库的归档路径中,最新的一个wal是D3。
- 启动备库
启动备库发现备库报错:
2022-11-18 15:23:24.265 CST [2047] FATAL: could not receive data from WAL stream: ERROR: requested WAL segment 0000000900000003000000D4 has already been removed
很显然,D4这个wal在主库的PG_WAL里面没有就报错了,也就说明就算主库的归档目录中有这个wal,备库也不会去归档路径获取。解决方式就是将主库的所有归档CP到备库的归档路径中就好,那么在启动的时候就使用备库归档做恢复,这里有一个问题就是备库是怎么完成从归档恢复wal的呢?这个是通过备库配置文件中的restore_command = '"/usr/local/pg14.2/bin/pg_probackup" archive-get -B "/pg_data/backup/db1" --instance "vicdb_5432" --wal-file-path=%p --wal-file-name=%f' 参数完成的
那么,这里又有了新的问题,就是到底需要将哪些归档cp过去呢?是全部的归档吗,但是在cp归档的过程中主库依然会产生新的归档啊。
继续测试发现:不需要cp全部的归档过去,当备库发现缺少wal的时候,会先看对应的wal是否在主库的pg_wal中,如果在那么就直接steaming过来,如果不在就通过restore_command从归档目录中获取。所以说我们只需要拷贝pg_wal目录中没有的wal到备库的归档路径中就好,剩下的wal备库会通过流复制的方式从主库的pg_wal中获取