0
点赞
收藏
分享

微信扫一扫

linux那些事之page fault(do_anonymous_page)(4)

程序猿不脱发2 2022-02-04 阅读 154

匿名缺页中断是内核中最常见的缺页中断之一,通常在应用程序中对mallo申请到的内存或者是通过mmap申请到的匿名映射内存访问触发的page fault都属于匿名缺页中断。匿名缺页中断会采用系统默认处理方式对缺页中断进程处理,处理匿名缺页中断函数为do_anonymous_page函数

do_anonymous_page

do_anonymous_page函数接口如下:

vm_fault_t do_anonymous_page(struct vm_fault *vmf)

入参:

  • struct vm_fault *vmf: handle_mm_fault函数规整传入的page fault参数以及后续一些处理结构,该结构见《linux那些事之page fault(AMD64架构)(handle_mm_fault)(3)》

返回值:

  • vm_faulr_t:处理page fault结果之后各种错误码,具体见《linux那些事之page fault(AMD64架构)(user space)(2)》。

do_anonymous_page处理流程

do_anonymous_page函数处理逻辑相对比较简单,主要是针对读内存是否可以支持zero page和 写内存以及不支持zero page场景进程处理,处理过程如下:

do_anonymous_page源码分析

结合do_anonymous_page源码和上述流程进程分析:


/*
 * We enter with non-exclusive mmap_lock (to exclude vma changes,
 * but allow concurrent faults), and pte mapped but not yet locked.
 * We return with mmap_lock still held, but pte unmapped and unlocked.
 */
static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
	struct vm_area_struct *vma = vmf->vma;
	struct page *page;
	vm_fault_t ret = 0;
	pte_t entry;

	/* File mapping without ->vm_ops ? */
	if (vma->vm_flags & VM_SHARED)
		return VM_FAULT_SIGBUS;

	/*
	 * Use pte_alloc() instead of pte_alloc_map().  We can't run
	 * pte_offset_map() on pmds where a huge pmd might be created
	 * from a different thread.
	 *
	 * pte_alloc_map() is safe to use under mmap_write_lock(mm) or when
	 * parallel threads are excluded by other means.
	 *
	 * Here we only have mmap_read_lock(mm).
	 */
	if (pte_alloc(vma->vm_mm, vmf->pmd))
		return VM_FAULT_OOM;

	/* See the comment in pte_alloc_one_map() */
	if (unlikely(pmd_trans_unstable(vmf->pmd)))
		return 0;

	/* Use the zero-page for reads */
	if (!(vmf->flags & FAULT_FLAG_WRITE) &&
			!mm_forbids_zeropage(vma->vm_mm)) {
		entry = pte_mkspecial(pfn_pte(my_zero_pfn(vmf->address),
						vma->vm_page_prot));
		vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd,
				vmf->address, &vmf->ptl);
		if (!pte_none(*vmf->pte)) {
			update_mmu_tlb(vma, vmf->address, vmf->pte);
			goto unlock;
		}
		ret = check_stable_address_space(vma->vm_mm);
		if (ret)
			goto unlock;
		/* Deliver the page fault to userland, check inside PT lock */
		if (userfaultfd_missing(vma)) {
			pte_unmap_unlock(vmf->pte, vmf->ptl);
			return handle_userfault(vmf, VM_UFFD_MISSING);
		}
		goto setpte;
	}

	/* Allocate our own private page. */
	if (unlikely(anon_vma_prepare(vma)))
		goto oom;
	page = alloc_zeroed_user_highpage_movable(vma, vmf->address);
	if (!page)
		goto oom;

	if (mem_cgroup_charge(page, vma->vm_mm, GFP_KERNEL))
		goto oom_free_page;
	cgroup_throttle_swaprate(page, GFP_KERNEL);

	/*
	 * The memory barrier inside __SetPageUptodate makes sure that
	 * preceding stores to the page contents become visible before
	 * the set_pte_at() write.
	 */
	__SetPageUptodate(page);

	entry = mk_pte(page, vma->vm_page_prot);
	entry = pte_sw_mkyoung(entry);
	if (vma->vm_flags & VM_WRITE)
		entry = pte_mkwrite(pte_mkdirty(entry));

	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address,
			&vmf->ptl);
	if (!pte_none(*vmf->pte)) {
		update_mmu_cache(vma, vmf->address, vmf->pte);
		goto release;
	}

	ret = check_stable_address_space(vma->vm_mm);
	if (ret)
		goto release;

	/* Deliver the page fault to userland, check inside PT lock */
	if (userfaultfd_missing(vma)) {
		pte_unmap_unlock(vmf->pte, vmf->ptl);
		put_page(page);
		return handle_userfault(vmf, VM_UFFD_MISSING);
	}

	inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES);
	page_add_new_anon_rmap(page, vma, vmf->address, false);
	lru_cache_add_active_or_unevictable(page, vma);
setpte:
	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, entry);

	/* No need to invalidate - it was non-present before */
	update_mmu_cache(vma, vmf->address, vmf->pte);
unlock:
	pte_unmap_unlock(vmf->pte, vmf->ptl);
	return ret;
release:
	put_page(page);
	goto unlock;
oom_free_page:
	put_page(page);
oom:
	return VM_FAULT_OOM;
}

 主要分析结果如下:

  • vm_flags 是否标记为共享VM_SHARED,如果是共享则意味着之前以及通过mmap方式在其他进程申请过物理内存,vma应该存在对应物理内存映射,不应该再发生page fault。
  • pte_alloc:如果pte存在则不需要申请一个新的pte(handle_mm_fault以及申请过pte),如果申请失败则意味这物理内存不够,返回OOM错误码触发OOM。
  • Transparent Hugepage Support 是否开启,如果没有开启则继续下一步,如果开启在walk page table之前需要运行pmd_trans_unstable一遍,防止PMD为NULL。
  • 如果page fault是通过读内存触发,并且没有禁止zero page,则针对读内存可以按照zero page进行特殊处理,否则就与写内存处理一致。
  • 写内存触发,首先调用anon_vma_prepare,对vma进行预处理,主要是创建anon_vma和anon_vma_chain,为后续反向映射做准备。
  • 调用alloc_zeroed_user_highpage_movable,申请物理内存,尽量从high zone 中的可移动区域中申请物理内存,64位系统没有high zone因此从normal zone中的可移动区域申请物理内存。
  • 申请内存成功之后,进程mcgroup处理,将新申请的page加入到mcgroup管理。
  • __SetPageUptodate: 更新page flag为PG_uptodate,意味着page 内容已经更新。
  • 获取pte 并重新按照需求构建entry,调用pte_offset_map_lock 将pte lock防止同时更新,并更多虚拟内存对应物理内存映射。
  • check_stable_address_space:对vma地址空间进行检查。
  • inc_mm_counter_fast:更新匿名page fault统计计数。
  • page_add_new_anon_rmap: 将申请的page 加入到对应反向映射中。
  • lru_cache_add_active_or_unevictable:加入到LRU链表中。
  • set_pte_at:更新pte;
  • update_mmu_cache: 更新MMU。
  • pte_unmap_unlock:是否PTE lock,并返回。

alloc_zeroed_user_highpage_movable

alloc_zeroed_user_highpage_movable函数用于从buddy中获取一个空闲页,中间还需要经过mempolicy子系统根据内存申请策略,对内存进行管控,一个匿名映射获取物理内存过程大概如下:

  • 触发匿名page fault之后,do_anonymous_page通过alloc_zeroed_user_highpage_movable 获取物理内存.
  • alloc_zeroed_user_highpage_movable:默认指定从high zone中中的可移动区域获取物理内存,但是由于64位系统没有high zone所以从normal zone中获取物理内存,调用alloc_pages_vma进入到Mempolicy子系统.
  • Mempolicy子系统决定了申请内存从哪里申请,决定了申请内存策略,如果是numa系统还决定了申请内存numa node节点,通过get_page_from_free_are进入到buddy系统
  • get_page_from_free_are从buddy系统中获取到一个空闲页。

整个物理内存管理结果如下:

  •  物理内存被划分成ZONE_DMA_ZONE_NORMAL以及32位系统还有ZONE_HIGH(可以 《linux内核那些事之ZONE》了解)。
  • 每个zone 区域的物理内存被buddy 按照不同order页进行管理(通过《linux内核那些事之buddy》了解。

alloc_zeroed_user_highpage_movable源码

alloc_zeroed_user_highpage_movable函数源码:

/**
 * alloc_zeroed_user_highpage_movable - Allocate a zeroed HIGHMEM page for a VMA that the caller knows can move
 * @vma: The VMA the page is to be allocated for
 * @vaddr: The virtual address the page will be inserted into
 *
 * This function will allocate a page for a VMA that the caller knows will
 * be able to migrate in the future using move_pages() or reclaimed
 */
static inline struct page *
alloc_zeroed_user_highpage_movable(struct vm_area_struct *vma,
					unsigned long vaddr)
{
	return __alloc_zeroed_user_highpage(__GFP_MOVABLE, vma, vaddr);
}

指定GFP为可移动区域__GFP_MOVABLE,__alloc_zeroed_user_highpage函数如下,如果开启__HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE 指针zero order:

#define __alloc_zeroed_user_highpage(movableflags, vma, vaddr) \
	alloc_page_vma(GFP_HIGHUSER | __GFP_ZERO | movableflags, vma, vaddr)

不支持zero order进入:

/**
 * __alloc_zeroed_user_highpage - Allocate a zeroed HIGHMEM page for a VMA with caller-specified movable GFP flags
 * @movableflags: The GFP flags related to the pages future ability to move like __GFP_MOVABLE
 * @vma: The VMA the page is to be allocated for
 * @vaddr: The virtual address the page will be inserted into
 *
 * This function will allocate a page for a VMA but the caller is expected
 * to specify via movableflags whether the page will be movable in the
 * future or not
 *
 * An architecture may override this function by defining
 * __HAVE_ARCH_ALLOC_ZEROED_USER_HIGHPAGE and providing their own
 * implementation.
 */
static inline struct page *
__alloc_zeroed_user_highpage(gfp_t movableflags,
			struct vm_area_struct *vma,
			unsigned long vaddr)
{
	struct page *page = alloc_page_vma(GFP_HIGHUSER | movableflags,
			vma, vaddr);

	if (page)
		clear_user_highpage(page, vaddr);

	return page;
}

alloc_page_vma (include\linux\gfp.h)宏定义如下

#define alloc_page_vma(gfp_mask, vma, addr)			\
	alloc_pages_vma(gfp_mask, 0, vma, addr, numa_node_id(), false)

调用alloc_pages_vma进入到mempolicy子系统(后续再学习了解),获取到当前运行所处的numa id(numa_node_id())

举报

相关推荐

0 条评论