应用程序或者内核都是运行在虚拟内存空间之中，kernel 启动完成之后如果一个虚拟地址要访问物理内存需要通过CPU MMU硬件进行地址转换，整个虚拟地址访问物理内存逻辑过程如下：

kernel 启动完成之后，应用程序或内核访问内存
触发CPU MMU 硬件转换，将VA(虚拟地址）->转换成PA(物理内存）
转换成功之后，并且锁访问的物理内存存在，则使用PA访问物理内存
转换失败或者物理内存不存在，触发CPU中断异常机制，中断号为#PF，14号中断
CPU根据kernel启动过程中设置的#PF 中断函数asm_exc_page_fault()，跳转到该中断函数，并由硬件自动将#PF具体错误码传递到该内存中
#PF中断函数入口asm_exc_page_fault(), 根据传入的错误码和从CR2寄存器读取到产生page fault的地址，进入到exc_page_fault函数
继续进一步处理，如果是内核空间地址发生#PF，则进入到do_kern_addr_fault函数，如果是用户空间则进入do_user_addr_fault()函数进一步处理。
asm_exc_page_fault(（）函数为page fault中断函数处理入口，主要由汇编语言组成，处理完成之后进入到exc_page_fault()处理部分为C处理入口。

MMU硬件地址转换

kernel启动完毕之后，一个应用程序或内核虚拟地址访问内存时，都需要通过MMU开启对虚拟地址转换成物理地址转换，其主要转换逻辑过程如下：

虚拟地址转换成物理地址，MMU首先会通过TLB 缓存中查找是否有对应地址映射，如果有则进入3过程说明TLB HIT，如果没有映射则进入2阶段说明TLB MISS
当进入3阶段之后，TLB命中之后，硬件会对权限进行检查，如果权限检查成功，则地址转换成功获取到物理地址
当TLB MISS处于第2阶段时，则开启Page Table从内存中分级遍历 page table（注意这里如果相应级别开启缓存机制会首先从缓存中查找）（请参考《linux那些事之 page translation(硬件篇）》），如果成功则进入权限检查
如果walk page table失败，则触发#PG 中断
同样即使地址转换成功，但是权限检查失败即第7阶段也会触发#PG.

与PAGE FAULT相关的硬件部分

从《linux那些事之中断与异常（AMD64架构）_1》了解到page fault中断号为14号，简称为#PG，根据AMD64官方说明，能够产生#PG主要由以下几点：

经过MMU 地址转换之后,TLB HIT命中之后获取的物理地址不存在
MMU MISS之后，通过walk page table 过程中entry 不存在。
尝试加载指令时，指令对应的物理地址没有执行权限
物理页内存权限检查pageing-protection checks失败
当CR4.PSE=1或者CR4.PAE=1时，page table entry中的保留位被置成1，地址转换过程中会发生#PG
在用户模式下获取数据时，因为protecion kery 检查未过,也会发生#PG

CR2

当发生#PG时，硬件会自动把发生#PG的虚拟地址保存到CR2寄存器中，当32位CPU时，CR2保存的是32位地址。当64位CPU是，保存的是64位虚拟地址：

Page_fault Error Code Returned

page fault error code用于表示具体的发生#PG错误码，没有专门的寄存器表示。当发生#PG时被硬件自动压入中断函数的栈中，中断函数可以从栈中获取到相关详细错误码，具体错误码分布如下：

具体错误原因如下：

P(present):BIt 0.当P位为0时，表示导致page fault原因时对应的物理页不存在。当P位被置1时，表示由page-protection物理页保护隔离导致的
R/W(Read/Write):BIt 1。当被置0时。导致page fault原因时读内存，如果被置1. 对内存进行写导致的。
U/S(User/Supervisor):BIt 2. 当被置0时，表明一个超级管理模式（CPL=0,or 2)对内存操作导致的。当被置1，表明时用户模式操作内存导致的（CPL=3).
RSV(Reserved):Bit 3。当被置1，表明当进行地址转换过程中，page table entry中的reserved位被置1 导致的。当被置0，表明entry reserved 没有被置1.
I/D（Instruction/Data）：BIt 4,当被置1，表明 page fault是在指令获取时导致的。当被置0，表明时数据访问时导致的。
PK(protection key):Bit 5.当被置1，表明时由于用户地址由protection key导致的（《linux内核那些事之Memory protection keys(硬件原理）》有介绍MPK特性）。
SS(Shadow Stack):Bit 6。当被置1，表明时有SS 访问导致的，注意只有当CR2.CET=1时才有效
RMP:Bit 31。如果被置1，说明#PG时有RMP 导致的。

asm_exc_page_fault

#PF 中断函数初始化

《linux那些事之中断与异常（AMD64架构）_2》中说明了整个中断函数在kernel初始化过程，#PF的中断函数最终是：

static const __initconst struct idt_data early_pf_idts[] = {
	INTG(X86_TRAP_PF,		asm_exc_page_fault),
};

对应asm_exc_page_fault中断函数，该函数是#PF的中断入口。

asm_exc_page_fault定义

asm_exc_page_fault()定义稍微复杂，主要是由于汇编和C两个混合实现的，该函数使用宏DECLARE_IDTENTRY_RAW_ERRORCODE（arch\x86\include\asm\idtentry.h)：

DECLARE_IDTENTRY_RAW_ERRORCODE(X86_TRAP_PF,	exc_page_fault);

X86_TRAP_PF 为#PF 中断向量号定义，DECLARE_IDTENTRY_RAW_ERRORCODE宏由于在汇编和C语言中都有切dtentry.h文件（arch\x86\include\asm\idtentry.h）汇编语言和C语言都会加载使用到，因此该文件实现有两个部分，分别为汇编加载该头文件使用部分以及C加载该头文件使用部分：

#ifndef __ASSEMBLY__  //C语言实现部分，被C文件引用
... ...

#define DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func)			\
	DECLARE_IDTENTRY_ERRORCODE(vector, func)
... ...

#else /* !__ASSEMBLY__ */
... ...    //汇编实现部分，被汇编文件引用

#define DECLARE_IDTENTRY_RAW_ERRORCODE(vector, func)			\
	DECLARE_IDTENTRY_ERRORCODE(vector, func)

... ...

#endif

vector为中断向量号，fuc为中断函数名，最后都是调用DECLARE_IDTENTRY_ERRORCODE：


#ifndef __ASSEMBLY__ //C语言部分， 被c文件引用

... ...

/**
 * DECLARE_IDTENTRY_ERRORCODE - Declare functions for simple IDT entry points
 *				Error code pushed by hardware
 * @vector:	Vector number (ignored for C)
 * @func:	Function name of the entry point
 *
 * Declares three functions:
 * - The ASM entry point: asm_##func
 * - The XEN PV trap entry point: xen_##func (maybe unused)
 * - The C handler called from the ASM entry point
 *
 * Same as DECLARE_IDTENTRY, but has an extra error_code argument for the
 * C-handler.
 */
#define DECLARE_IDTENTRY_ERRORCODE(vector, func)			\
	asmlinkage void asm_##func(void);				\
	asmlinkage void xen_asm_##func(void);				\
	__visible void func(struct pt_regs *regs, unsigned long error_code)

... ...

#else  //汇编部分，被汇编文件引用头文件部分

... ... 

#define DECLARE_IDTENTRY_ERRORCODE(vector, func)			\
	idtentry vector asm_##func func has_error_code=1

... ...

#endif

DECLARE_IDTENTRY_ERRORCODE 汇编部分主要是定义func函数，当传入vector中断向量号为X86_TRAP_PF， func为exc_page_fault，汇编部分展开结构部分为：

idtentry vector asm_exc_page_fault exc_page_fault has_error_code=1

主要是实现 asm_exc_page_fault 函数功能， has_error_code为具体硬件返回的错误码，调用使用汇编定义的idtentry 宏。而C语言部分展开：

asmlinkage void asm_exc_page_fault (void);                \
    asmlinkage void xen_asm_exc_page_fault (void);                \
    __visible void exc_page_fault(struct pt_regs *regs, unsigned long error_code)

主要是对 asm_exc_page_fault申明，最后调用exc_page_fault函数，asm_exc_page_fault--->exc_page_fault， asmlinkage 表明asm_exc_page_fault函数是通过栈传参。

idtentry宏

idtentry宏定义是使用.macro 宏进行定义位于arch\x86\entry\entry_64.s文件中，.macro宏用法如下：

.macro macname macpara
...
...
.endm

macname为宏名称，macpara为宏参数，可以接多个参数，idtentry宏定义如下：

/**
 * idtentry - Macro to generate entry stubs for simple IDT entries
 * @vector:		Vector number
 * @asmsym:		ASM symbol for the entry point
 * @cfunc:		C function to be called
 * @has_error_code:	Hardware pushed error code on stack
 *
 * The macro emits code to set up the kernel context for straight forward
 * and simple IDT entries. No IST stack, no paranoid entry checks.
 */
.macro idtentry vector asmsym cfunc has_error_code:req
SYM_CODE_START(\asmsym)
	UNWIND_HINT_IRET_REGS offset=\has_error_code*8
	ASM_CLAC

	.if \has_error_code == 0
		pushq	$-1			/* ORIG_RAX: no syscall to restart */
	.endif

	.if \vector == X86_TRAP_BP
		/*
		 * If coming from kernel space, create a 6-word gap to allow the
		 * int3 handler to emulate a call instruction.
		 */
		testb	$3, CS-ORIG_RAX(%rsp)
		jnz	.Lfrom_usermode_no_gap_\@
		.rept	6
		pushq	5*8(%rsp)
		.endr
		UNWIND_HINT_IRET_REGS offset=8
.Lfrom_usermode_no_gap_\@:
	.endif

	idtentry_body \cfunc \has_error_code

_ASM_NOKPROBE(\asmsym)
SYM_CODE_END(\asmsym)
.endm

vector 为中断向量号，asmsym为相对于中断向量号对应的汇编中断函数，cfunc对应的该中断函数C语言部分，has_error_code:req调用传入的error值，此时值为1，在#PF中断中asmsym为asm_exc_page_fault函数，cfunc为exc_page_fault函数，

SYM_CODE_START(\asmsym)

...

SYM_CODE_END(\asmsym)

为asm_exc_page_fault实现，asm_exc_page_fault函数最后处理部分为（（page fault部分传入的has_error_code为1，中断为#PG，处理跳过直接进入idtentry_body）:

idtentry_body \cfunc \has_error_code

调用idtentry_body宏。

idtentry_body 宏

idtentry_body宏定义如下：

/**
 * idtentry_body - Macro to emit code calling the C function
 * @cfunc:		C function to be called
 * @has_error_code:	Hardware pushed error code on stack
 */
.macro idtentry_body cfunc has_error_code:req

	call	error_entry
	UNWIND_HINT_REGS

	movq	%rsp, %rdi			/* pt_regs pointer into 1st argument*/

	.if \has_error_code == 1
		movq	ORIG_RAX(%rsp), %rsi	/* get error code into 2nd argument*/
		movq	$-1, ORIG_RAX(%rsp)	/* no syscall to restart */
	.endif

	call	\cfunc

	jmp	error_return
.endm

has_error_code为1，表明是通过栈传递参数，将发生的具体page fault error code值，通过movq指令压入栈中，最后通过call 功能调用到 cfunc即C语言部分， page fault为exc_page_fault函数，从而跳入到C语言部分。

exc_page_fault

exc_page_fault函数定义位于（arch\x86\mm\fault.c)文件中：


DEFINE_IDTENTRY_RAW_ERRORCODE(exc_page_fault)
{
	unsigned long address = read_cr2();
	bool rcu_exit;

	prefetchw(&current->mm->mmap_lock);

	/*
	 * KVM has two types of events that are, logically, interrupts, but
	 * are unfortunately delivered using the #PF vector.  These events are
	 * "you just accessed valid memory, but the host doesn't have it right
	 * now, so I'll put you to sleep if you continue" and "that memory
	 * you tried to access earlier is available now."
	 *
	 * We are relying on the interrupted context being sane (valid RSP,
	 * relevant locks not held, etc.), which is fine as long as the
	 * interrupted context had IF=1.  We are also relying on the KVM
	 * async pf type field and CR2 being read consistently instead of
	 * getting values from real and async page faults mixed up.
	 *
	 * Fingers crossed.
	 *
	 * The async #PF handling code takes care of idtentry handling
	 * itself.
	 */
	if (kvm_handle_async_pf(regs, (u32)address))
		return;

	/*
	 * Entry handling for valid #PF from kernel mode is slightly
	 * different: RCU is already watching and rcu_irq_enter() must not
	 * be invoked because a kernel fault on a user space address might
	 * sleep.
	 *
	 * In case the fault hit a RCU idle region the conditional entry
	 * code reenabled RCU to avoid subsequent wreckage which helps
	 * debugability.
	 */
	rcu_exit = idtentry_enter_cond_rcu(regs);

	instrumentation_begin();
	handle_page_fault(regs, error_code, address);
	instrumentation_end();

	idtentry_exit_cond_rcu(regs, rcu_exit);
}

address = read_cr2() 从CR2寄存器中获取到发生异常的虚拟地址。
对当前进程mm加锁current->mm->mmap_lock。
kvm_handle_async_pf： KVM相关操作。
idtentry_enter_cond_rcu： RCU处理。
instrumentation_begin：主要是配合noinstr变量修饰的函数，用于防止在当前中断增在处理过程中，再次同样的中断发生，以覆盖当前一些状态寄存器（https://lwn.net/Articles/877229/），begin 为开始锁定区域。
handle_page_fault 进一步处理page fault, 传入error code 及虚拟地址。
instrumentation_end: end为结束锁定区域。
idtentry_exit_cond_rcu：退出处理。

DEFINE_IDTENTRY_RAW_ERRORCODE

DEFINE_IDTENTRY_RAW_ERRORCODE宏定义如下：


/**
 * DEFINE_IDTENTRY_RAW_ERRORCODE - Emit code for raw IDT entry points
 * @func:	Function name of the entry point
 *
 * @func is called from ASM entry code with interrupts disabled.
 *
 * The macro is written so it acts as function definition. Append the
 * body with a pair of curly brackets.
 *
 * Contrary to DEFINE_IDTENTRY_ERRORCODE() this does not invoke the
 * idtentry_enter/exit() helpers before and after the body invocation. This
 * needs to be done in the body itself if applicable. Use if extra work
 * is required before the enter/exit() helpers are invoked.
 */
#define DEFINE_IDTENTRY_RAW_ERRORCODE(func)				\
__visible noinstr void func(struct pt_regs *regs, unsigned long error_code)

exc_page_fault函数定义扩展为：

__visible noinstr void exc_page_fault(struct pt_regs *regs, unsigned long error_code)

noinstr变量修饰中断函数，主要是用于防止当前中断正在处理过程中，硬件再次发生同样的中断以覆盖某些状态寄存器：

handle_page_fault

handle_page_fault主要处理如下：


static __always_inline void
handle_page_fault(struct pt_regs *regs, unsigned long error_code,
			      unsigned long address)
{
	trace_page_fault_entries(regs, error_code, address);

	if (unlikely(kmmio_fault(regs, address)))
		return;

	/* Was the fault on kernel-controlled part of the address space? */
	if (unlikely(fault_in_kernel_space(address))) {
		do_kern_addr_fault(regs, error_code, address);
	} else {
		do_user_addr_fault(regs, error_code, address);
		/*
		 * User address page fault handling might have reenabled
		 * interrupts. Fixing up all potential exit points of
		 * do_user_addr_fault() and its leaf functions is just not
		 * doable w/o creating an unholy mess or turning the code
		 * upside down.
		 */
		local_irq_disable();
	}
}