0
点赞
收藏
分享

微信扫一扫

PG守护进程(Postmaster)——后台二等公民进程第一波启动maybe_start_bgworkers

覃榜言 2022-05-01 阅读 44

PostgreSQL数据库的后台二等公民进程包括普通后端进程、walsender进程、Autovacuum进程和后台进程。Postmaster守护进程为每个上述进程分配一个Backend结构体,并被组织成双向链表BackendList。如下所示是这些子进程加入该双向链表的调用栈。
在这里插入图片描述
普通后端进程
BackendStartup --> malloc申请Backend内存,调用AssginPostmasterChildSlot为MyPMChildSlot和child_slot成员初始化
–> dlist_push_head(&BackendList, &bn->elem)

walsender进程
walsender进程同样走BackendStartup函数,其bkend_type由SignalSomeChildren和CountChildren设置

Autovacuum进程
StartAutovacuumWorker --> dlist_push_head(&BackendList, &bn->elem)
postmaster检测到vaculauncher发送的PMSIGNAL_START_AUTOVAC_WORKER信号后启动vacuumworker后台进程

后台进程
maybe_start_bgworkers --> do_start_bgworker --> assign_backendlist_entry --> bn = malloc(sizeof(Backend)) --> rw->rw_backend = bn
–> dlist_push_head(&BackendendList, &bn->elem)

后台二等公民进程保活

后台二等公民进程保活和启动的代码就是maybe_start_bgworkers函数,maybe_start_bgworkers函数调用处就是后台二等公民进程保活:

  • StartupDataBase之后
  • ServerLoop如果设置了StartWorkerNeeded或HaveCrashedWorker
  • reaper在检测到启动进程正常退出后
  • sigusr1_handler信号处理函数中如果设置了StartWorkerNeeded或HaveCrashedWorker

maybe_start_bgworkers函数

maybe_start_bgworkers函数在时间合适的时候,启动后台进程。作为副作用,bgworker 控制变量的设置或重置取决于是否需要启动更多的工作程序。我们限制了每次调用启动的工作人员数量,以避免在许多此类请求待处理时花费太长时间来占用Postmaster的注意力。 只要 StartWorkerNeeded 为 true,ServerLoop 就不会阻塞,在处理完其他问题后会再次调用该函数。maybe_start_bgworkers函数的主要执行流程:

  1. 遍历BackgroundWorkerList链表,取出RegisteredBgWorker条目
  2. 如果后台进程RegisteredBgWorker结构体中的rw_pid不为0,说明进程正在运行,不处理该后台进程
  3. 如果后台进程RegisteredBgWorker结构体标记为死亡,调用ForgetBackgroundWorker函数清理并从列表中删除
  4. 如果worker之前已崩溃,则可能需要重新启动它(除非在注册时指定它根本不想重新启动)。 检查上次发生崩溃是多久以前。 如果上次崩溃太近,不要立即启动它; 让它在足够的时间过去后重新启动。如果进程指定需要通知的兄弟进程且注册时指定它根本不想重新启动,就使用kill SIGUSR1通知它
  5. 设置标志以记住我们有worker稍后需要启动,这个就是上面描述的让它在足够的时间过去后重新启动的流程。
  6. 调用bgworker_should_start_now函数判定该后台进程是否需要现在启动,如果有就调用do_start_bgworker函数启动(如果失败,请暂时放弃处理后台进程,但设置 StartWorkerNeeded标记,以便我们在下一次 ServerLoop 迭代时回到这里再试一次。 我们不想等待,因为可能还有其他准备运行的后台进程。我们也可以设置 HaveCrashedWorker,因为这个工作人员现在被标记为崩溃,但没有必要,因为这个函数的下一次运行会做)。如果我们已经启动了尽可能多的后台进程,请退出,但让 ServerLoop 再次调用我们以寻找其他准备运行的后台进程。 可能没有,但我们下次运行时会发现。
static void maybe_start_bgworkers(void) {
#define MAX_BGWORKERS_TO_LAUNCH 100
	int			num_launched = 0;
	TimestampTz now = 0;
	slist_mutable_iter iter;

	/* During crash recovery, we have no need to be called until the state transition out of recovery. */
	// 在崩溃恢复期间,我们不需要被调用,直到状态转换出恢复,代码逻辑就是下面
	if (FatalError) {
		StartWorkerNeeded = false;
		HaveCrashedWorker = false;
		return;
	}
	/* Don't need to be called again unless we find a reason for it below */
	// 除非我们在下面找到原因,否则不需要再次调用
	StartWorkerNeeded = false;
	HaveCrashedWorker = false;

	slist_foreach_modify(iter, &BackgroundWorkerList) { // 遍历BackgroundWorkerList链表
		RegisteredBgWorker *rw;
		rw = slist_container(RegisteredBgWorker, rw_lnode, iter.cur);		
		if (rw->rw_pid != 0) continue; /* ignore if already running */		
		if (rw->rw_terminate) { /* if marked for death, clean up and remove from list 如果标记为死亡,清理并从列表中删除 */
			ForgetBackgroundWorker(&iter);
			continue;
		}

		/* If this worker has crashed previously, maybe it needs to be restarted (unless on registration it specified it doesn't want to be restarted at all).  Check how long ago did a crash last happen. If the last crash is too recent, don't start it right away; let it be restarted once enough time has passed. 如果worker之前已崩溃,则可能需要重新启动它(除非在注册时指定它根本不想重新启动)。 检查上次发生崩溃是多久以前。 如果上次崩溃太近,不要立即启动它; 让它在足够的时间过去后重新启动 */
		if (rw->rw_crashed_at != 0) {
			if (rw->rw_worker.bgw_restart_time == BGW_NEVER_RESTART) {
				int			notify_pid;
				notify_pid = rw->rw_worker.bgw_notify_pid;
				ForgetBackgroundWorker(&iter);				
				if (notify_pid != 0) kill(notify_pid, SIGUSR1); /* Report worker is gone now. 如果进程指定需要通知的兄弟进程,就使用kill SIGUSR1通知它 */
				continue;
			}
			
			if (now == 0) now = GetCurrentTimestamp(); /* read system time only when needed */

			if (!TimestampDifferenceExceeds(rw->rw_crashed_at, now, rw->rw_worker.bgw_restart_time * 1000)) {			
				HaveCrashedWorker = true; /* Set flag to remember that we have workers to start later 设置标志以记住我们有worker稍后需要启动 */
				continue;
			}
		}

		if (bgworker_should_start_now(rw->rw_worker.bgw_start_time)) {			
			rw->rw_crashed_at = 0; /* reset crash time before trying to start worker */
			/* Try to start the worker.
			 * On failure, give up processing workers for now, but set StartWorkerNeeded so we'll come back here on the next iteration of ServerLoop to try again.  (We don't want to wait, because there might be additional ready-to-run workers.)  We could set HaveCrashedWorker as well, since this worker is now marked crashed, but there's no need because the next run of this function will do that. 如果失败,请暂时放弃处理工作人员,但设置 StartWorkerNeeded,以便我们在下一次 ServerLoop 迭代时回到这里再试一次。 (我们不想等待,因为可能还有其他准备运行的工作人员。)我们也可以设置 HaveCrashedWorker,因为这个工作人员现在被标记为崩溃,但没有必要,因为这个函数的下一次运行会做 那。*/
			if (!do_start_bgworker(rw))
			{
				StartWorkerNeeded = true;
				return;
			}

			/* If we've launched as many workers as allowed, quit, but have ServerLoop call us again to look for additional ready-to-run workers.  There might not be any, but we'll find out the next time we run. 如果我们已经启动了尽可能多的工作人员,请退出,但让 ServerLoop 再次调用我们以寻找其他准备运行的工作人员。 可能没有,但我们下次运行时会发现 */
			if (++num_launched >= MAX_BGWORKERS_TO_LAUNCH) {
				StartWorkerNeeded = true;
				return;
			}
		}
	}
}
举报

相关推荐

0 条评论