0
点赞
收藏
分享

微信扫一扫

linux那些事之反向映射RMAP

内核中经常出现一个page 被多个虚拟进程使用,例如fork是子进程会继承父进程的页面,同样在开始时拥有相同的页面,此时一个page会对应多个进程VMA:

在共享内存场景中,一个物理内存也会被多个进程同时使用。针对上述场景,如果是一个文件被多个进程使用,实际上只对应一个物理页,此时page为page cache,page结构中可以通过page中的struct address_space *mapping结构进行管理。

而针对匿名页由于没有对应具体文件,但是又经常需要用到根据page查找到对应的所有正在使用的进程vma,该查找过程被称为反向映射(RMAP).

匿名页反向映射管理数据结构

匿名页反向映射用于记录page与VMA 1:N关系;其结构如下:

  •  struct anon_vma 结构为基于全局相当于page cache中的address_space结构,用于记录page 对应多少个进程的VMA,通过anon_vma_chain这个桥梁记录对应多少个进程。
  • struct page中的mapping 为一个复用结构成员,当为匿名页是用来记录对应的struct anon_vma结构。
  • struct anon_vma_chain:为anon_vma与vma桥梁,struct anon_vma中rb_root为记录多少anon_vma_chain结构,每个anon_vma_chain内对应相同vma。struct anon_vma一般每个进程为一个rb_root中的节点,如果是子进程继承父进程中页面,则子进程中也会创建复制anon_vma_chain。
  • struct vm_area_struct中的anon_vma成员指向对应struct anon_vma。struct list_head anon_vma_chain对 anon_vma_chain链表。

反向映射设计数据结构设计要求:

  • 能够根据struct page能够快速找到所对应进程的vma,能够快速到查找到vma中的page table,故采用rb_root用于快速反向查找。
  • 能够快速根据其中进程的一个vma,查找到该vma所对应的page 中其他进程vma 使用该page,故struct_vma_are_struct中设计有struct list_head anon_vma_chain双向链表。


struct page与反向映射相关成员

struct page与反向映射相关成员:


struct page {
	unsigned long flags;		/* Atomic flags, some possibly updated asynchronously */

	union {
		struct {	/* Page cache and anonymous pages */
			/**
			 * @lru: Pageout list, eg. active_list protected by
			 * pgdat->lru_lock.  Sometimes used as a generic list
			 * by the page owner.
			 */
			struct list_head lru;
			/* See page-flags.h for PAGE_MAPPING_FLAGS */
			struct address_space *mapping;
			pgoff_t index;		/* Our offset within mapping. */
			
            ... ...
		};
		
	};

    ... ...

	union {		/* This union is 4 bytes in size. */
		/*
		 * If the page can be mapped to userspace, encodes the number
		 * of times this page is referenced by a page table.
		 */
		atomic_t _mapcount;
        
        ... ...
	};

	... ...
} _struct_page_alignment;

struct page在设计之初为了节约所占有的物理内存空间,很多成员都采用了复用技术, 一个成员结构可能有多种不同使用场景,在匿名页反向映射中也是同样如此:

  • struct address_space *mapping: 当page 为page cache时,mapping指向的是物理内存所对应具体文件在磁盘中位置。当为匿名映射时,mapping对应的是struct anon_vma 结构,用来记录page和vma对应关系(struct page mapping用于解决根据一个page查找所有vma问题)。
  • pgoff_t index:当为page cache记录的是在文件中偏移位置。当为匿名映射时对应的时该page在vma中偏移位置。
  • atomic_t _mapcount:用来记录该page被多少个进程所共享使用。

 struct anon_vma 

反向映射管理数据结构,意思是匿名页对应vma,用于记录匿名页记录对应多少个vma:

struct anon_vma {
	struct anon_vma *root;		/* Root of this anon_vma tree */
	struct rw_semaphore rwsem;	/* W: modification, R: walking the list */
	/*
	 * The refcount is taken on an anon_vma when there is no
	 * guarantee that the vma of page tables will exist for
	 * the duration of the operation. A caller that takes
	 * the reference is responsible for clearing up the
	 * anon_vma if they are the last user on release
	 */
	atomic_t refcount;

	/*
	 * Count of child anon_vmas and VMAs which points to this anon_vma.
	 *
	 * This counter is used for making decision about reusing anon_vma
	 * instead of forking new one. See comments in function anon_vma_clone.
	 */
	unsigned degree;

	struct anon_vma *parent;	/* Parent of this anon_vma */

	/*
	 * NOTE: the LSB of the rb_root.rb_node is set by
	 * mm_take_all_locks() _after_ taking the above lock. So the
	 * rb_root must only be read/written after taking the above lock
	 * to be sure to see a valid next pointer. The LSB bit itself
	 * is serialized by a system wide lock only visible to
	 * mm_take_all_locks() (mm_all_locks_mutex).
	 */

	/* Interval tree of private "related" vmas */
	struct rb_root_cached rb_root;
};
  •  struct anon_vma *root: 对应anon_vma中rb_root 红黑树中根节点对应anon_vma。
  • struct rw_semaphore rwsem:信号量,用于对该结构加锁。
  • atomic_t refcount:引用计数,表明有多少个活跃进程page table在使用该映射。
  • unsigned degree:该页记录对应多少个进程vma.
  • struct anon_vma *parent:指向该节点在rb_root中的父节点。
  • struct rb_root_cached rb_root:红黑树,每个节点对应一个struct anon_vma_chain,一般一个AVC(anon_vma_chain)结构对应一个进程,用于记录vma与page对应关系。

struct anon_vma_chain 

struct anon_vma_chain 结构记录anon_vma与vma对应关系:

struct anon_vma_chain {
	struct vm_area_struct *vma;
	struct anon_vma *anon_vma;
	struct list_head same_vma;   /* locked by mmap_lock & page_table_lock */
	struct rb_node rb;			/* locked by anon_vma->rwsem */
	unsigned long rb_subtree_last;
#ifdef CONFIG_DEBUG_VM_RB
	unsigned long cached_vma_start, cached_vma_last;
#endif
  •  struct vm_area_struct *vma:记录所对应vma。
  • struct anon_vma *anon_vma:所对应anon_vma。
  • struct list_head same_vma:记录相同vma的进程链表,对应vm_area_struct中的anon_vma_chain,会把相同vma的进程给双向链表串联起来。
  • struct rb_node rb: anon_vma中的rb_root。
  • unsigned long cached_vma_start, cached_vma_last:用于调试,vma起始和结束。需要使用CONFIG_DEBUG_VM_RB开启。

anon_vma_chain结构即作为一个双向链表结构成员,也作为红黑树上一个节点,满足不同需求。

struct vm_area_struct 与反向映射相关成员

struct vma_are_struct用于解决根据一个vma 查找对应page中其他进程vma问题:

struct vm_area_struct {
    ... ...

	/*
	 * A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
	 * list, after a COW of one of the file pages.	A MAP_SHARED vma
	 * can only be in the i_mmap tree.  An anonymous MAP_PRIVATE, stack
	 * or brk vma (with NULL file) can only be in an anon_vma list.
	 */
	struct list_head anon_vma_chain; /* Serialized by mmap_lock &
					  * page_table_lock */
	struct anon_vma *anon_vma;	/* Serialized by page_table_lock */

    ... ...
} __randomize_layout;
  • struct list_head anon_vma_chain:用于挂载所属于相同VMA进程在处于同一个链表,方便快速查找。
  • struct anon_vma *anon_vma:该vma所对应anon_vma。

page cache反向映射管理数据结构 

page cache反向映射管理与匿名页稍微不同,主要是page mapping用来指向address_space:

  •  与匿名页page 相比,page cache匿名页相对比较简单,mapping指向的是struct address_space结构(《linux那些事之page cache》),
  • struct address_space 结构中 i_mmap成员为红黑树结构,每个节点直接存储一个进程VMA,表明一个进程访问该文件,对应该进程vma.

struct address_space 

struct address_space 与反向映射相关数据结构:

struct address_space {
    ... ...
	struct rb_root_cached	i_mmap;
	struct rw_semaphore	i_mmap_rwsem;
	... ...
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
	/*
  •  struct rb_root_cached    i_mmap:记录所有访问该文件的VMA
  • struct rw_semaphore    i_mmap_rwsem:i_mmap锁。 

 匿名页anon_vma常用操作

anon_vma_prepare

anon_vma_prepare函数主要是为匿名反向映射做准备,创建anon_vma与vma映射关系:


/**
 * __anon_vma_prepare - attach an anon_vma to a memory region
 * @vma: the memory region in question
 *
 * This makes sure the memory mapping described by 'vma' has
 * an 'anon_vma' attached to it, so that we can associate the
 * anonymous pages mapped into it with that anon_vma.
 *
 * The common case will be that we already have one, which
 * is handled inline by anon_vma_prepare(). But if
 * not we either need to find an adjacent mapping that we
 * can re-use the anon_vma from (very common when the only
 * reason for splitting a vma has been mprotect()), or we
 * allocate a new one.
 *
 * Anon-vma allocations are very subtle, because we may have
 * optimistically looked up an anon_vma in page_lock_anon_vma_read()
 * and that may actually touch the spinlock even in the newly
 * allocated vma (it depends on RCU to make sure that the
 * anon_vma isn't actually destroyed).
 *
 * As a result, we need to do proper anon_vma locking even
 * for the new allocation. At the same time, we do not want
 * to do any locking for the common case of already having
 * an anon_vma.
 *
 * This must be called with the mmap_lock held for reading.
 */
int __anon_vma_prepare(struct vm_area_struct *vma)
{
	struct mm_struct *mm = vma->vm_mm;
	struct anon_vma *anon_vma, *allocated;
	struct anon_vma_chain *avc;

	might_sleep();

	avc = anon_vma_chain_alloc(GFP_KERNEL);
	if (!avc)
		goto out_enomem;

	anon_vma = find_mergeable_anon_vma(vma);
	allocated = NULL;
	if (!anon_vma) {
		anon_vma = anon_vma_alloc();
		if (unlikely(!anon_vma))
			goto out_enomem_free_avc;
		allocated = anon_vma;
	}

	anon_vma_lock_write(anon_vma);
	/* page_table_lock to protect against threads */
	spin_lock(&mm->page_table_lock);
	if (likely(!vma->anon_vma)) {
		vma->anon_vma = anon_vma;
		anon_vma_chain_link(vma, avc, anon_vma);
		/* vma reference or self-parent link for new root */
		anon_vma->degree++;
		allocated = NULL;
		avc = NULL;
	}
	spin_unlock(&mm->page_table_lock);
	anon_vma_unlock_write(anon_vma);

	if (unlikely(allocated))
		put_anon_vma(allocated);
	if (unlikely(avc))
		anon_vma_chain_free(avc);

	return 0;

 out_enomem_free_avc:
	anon_vma_chain_free(avc);
 out_enomem:
	return -ENOMEM;
}
  • 生成一个anon_vma_chain结构,用于创建vma与anon_vma映射管理。
  • find_mergeable_anon_vma:查看是否能够复用相邻vma所属anon_vma结构。
  • 如果不能复用,调用anon_vma_alloc创建一个新anon_vma结构
  • anon_vma_lock_write(anon_vma):对anon_vma解锁,后续需要操作。
  • 加锁mm->page_table_lock。
  • 将刚申请的anon_vma结构,赋值到vma->anon_vma。
  • anon_vma_chain_link:创建vma aon_vma及anon_vma_chain直接管理关系
  • anon_vma->degree++:说明红黑树添加一个。
  • mm->page_table_lock:解锁
  • anon_vma_unlock_write(anon_vma):解锁

anon_vma_chain_link

anon_vma_chain_link创建vma aon_vma及anon_vma_chain三者直接关系:

static void anon_vma_chain_link(struct vm_area_struct *vma,
				struct anon_vma_chain *avc,
				struct anon_vma *anon_vma)
{
	avc->vma = vma;
	avc->anon_vma = anon_vma;
	list_add(&avc->same_vma, &vma->anon_vma_chain);
	anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
}
  • avc->vma = vma;avc->anon_vma = anon_vma,构建anon_vma_chain 关键关系(vma,anon_vma)两者对应关系。
  • list_add(&avc->same_vma, &vma->anon_vma_chain): 将vma->anon_vma_chain双向链表加入avc->same_vma,意思是vma->anon_vma_chain中的链表所有vma相同。
  • anon_vma_interval_tree_insert(avc, &anon_vma->rb_root): 将avc加入到anon_vma->rb_root红黑树中。

anon_vma_alloc

anon_vma_alloc用于通过slab 申请一个新的anon_vma:

static inline struct anon_vma *anon_vma_alloc(void)
{
	struct anon_vma *anon_vma;

	anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
	if (anon_vma) {
		atomic_set(&anon_vma->refcount, 1);
		anon_vma->degree = 1;	/* Reference for first vma */
		anon_vma->parent = anon_vma;
		/*
		 * Initialise the anon_vma root to point to itself. If called
		 * from fork, the root will be reset to the parents anon_vma.
		 */
		anon_vma->root = anon_vma;
	}

	return anon_vma;
}

anon_vma_chain_alloc

anon_vma_chain_alloc申请一个新的anon_vma_chain:

static inline struct anon_vma_chain *anon_vma_chain_alloc(gfp_t gfp)
{
	return kmem_cache_alloc(anon_vma_chain_cachep, gfp);
}

page_add_new_anon_rmap

page_add_new_anon_rmap将一个物理页page 与anon_vma以及vma建立反向映射关系:


/**
 * page_add_new_anon_rmap - add pte mapping to a new anonymous page
 * @page:	the page to add the mapping to
 * @vma:	the vm area in which the mapping is added
 * @address:	the user virtual address mapped
 * @compound:	charge the page as compound or small page
 *
 * Same as page_add_anon_rmap but must only be called on *new* pages.
 * This means the inc-and-test can be bypassed.
 * Page does not have to be locked.
 */
void page_add_new_anon_rmap(struct page *page,
	struct vm_area_struct *vma, unsigned long address, bool compound)
{
	int nr = compound ? hpage_nr_pages(page) : 1;

	VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
	__SetPageSwapBacked(page);
	if (compound) {
		VM_BUG_ON_PAGE(!PageTransHuge(page), page);
		/* increment count (starts at -1) */
		atomic_set(compound_mapcount_ptr(page), 0);
		if (hpage_pincount_available(page))
			atomic_set(compound_pincount_ptr(page), 0);

		__inc_lruvec_page_state(page, NR_ANON_THPS);
	} else {
		/* Anon THP always mapped first with PMD */
		VM_BUG_ON_PAGE(PageTransCompound(page), page);
		/* increment count (starts at -1) */
		atomic_set(&page->_mapcount, 0);
	}
	__mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
	__page_set_anon_rmap(page, vma, address, 1);
}
  • __SetPageSwapBacked:将匿名页中标记位设置PG_swapbacked标记位,后续如果被回收该page中的内容将会被刷新到swap分区。
  • 更新设置page->_mapcount中的计数为0,需要考虑compound page场景。
  • __mod_lruvec_page_state:更新lru统计信息。
  • __page_set_anon_rmap: 设置page 、vma之间映射关系。

__page_set_anon_rmap

__page_set_anon_rmap设置page 、vma之间映射关系:

/**
 * __page_set_anon_rmap - set up new anonymous rmap
 * @page:	Page or Hugepage to add to rmap
 * @vma:	VM area to add page to.
 * @address:	User virtual address of the mapping	
 * @exclusive:	the page is exclusively owned by the current process
 */
static void __page_set_anon_rmap(struct page *page,
	struct vm_area_struct *vma, unsigned long address, int exclusive)
{
	struct anon_vma *anon_vma = vma->anon_vma;

	BUG_ON(!anon_vma);

	if (PageAnon(page))
		return;

	/*
	 * If the page isn't exclusively mapped into this vma,
	 * we must use the _oldest_ possible anon_vma for the
	 * page mapping!
	 */
	if (!exclusive)
		anon_vma = anon_vma->root;

	anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
	page->mapping = (struct address_space *) anon_vma;
	page->index = linear_page_index(vma, address);
}
  • anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON: 将anon_vma地址加上PAGE_MAPPING_ANON标记位。
  • page->mapping = (struct address_space *) anon_vma: page mapping复用,匿名页是用于指向anon_vma,作为记录反向映射关系。
  • page->index = linear_page_index(vma, address): 更新index,此时index表示的是该页 对应vma中的偏移,计算方式如下:

anon_vma_clone

anon_vma_clone通常用于父子进程中,将父进程中vma中的anon_vma_chain完全复制一份新的anon_vma_chain链表管理到子进程vma中:

 

  • 如上图所示,将父进程vma中的anon_vma_chain所有对应的chain复制一份到子进程中。
  • 父子进程中的anon_vma_chain中的所有chain全部生成新的,但是每个chain中包含的anon_vma指向的都是相同节点。
  • 父子进程中的chain每个节点anon_vma相同,但是vma不同分别对应各自父子进程vma。
int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
{
	struct anon_vma_chain *avc, *pavc;
	struct anon_vma *root = NULL;

	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
		struct anon_vma *anon_vma;

		avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
		if (unlikely(!avc)) {
			unlock_anon_vma_root(root);
			root = NULL;
			avc = anon_vma_chain_alloc(GFP_KERNEL);
			if (!avc)
				goto enomem_failure;
		}
		anon_vma = pavc->anon_vma;
		root = lock_anon_vma_root(root, anon_vma);
		anon_vma_chain_link(dst, avc, anon_vma);

		/*
		 * Reuse existing anon_vma if its degree lower than two,
		 * that means it has no vma and only one anon_vma child.
		 *
		 * Do not chose parent anon_vma, otherwise first child
		 * will always reuse it. Root anon_vma is never reused:
		 * it has self-parent reference and at least one child.
		 */
		if (!dst->anon_vma && src->anon_vma &&
		    anon_vma != src->anon_vma && anon_vma->degree < 2)
			dst->anon_vma = anon_vma;
	}
	if (dst->anon_vma)
		dst->anon_vma->degree++;
	unlock_anon_vma_root(root);
	return 0;

 enomem_failure:
	/*
	 * dst->anon_vma is dropped here otherwise its degree can be incorrectly
	 * decremented in unlink_anon_vmas().
	 * We can safely do this because callers of anon_vma_clone() don't care
	 * about dst->anon_vma if anon_vma_clone() failed.
	 */
	dst->anon_vma = NULL;
	unlink_anon_vmas(dst);
	return -ENOMEM;
}
  •  遍历src中的anon_vma_chain所有chain.
  • dst vma中申请新的chain:avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
  • 生成新的chain中的anon_vma 采用继承,vma使用父子进程各自vma
  • anon_vma_chain_link:将新生成的节点插入到dst anon_vma_chain中。

anon_vma_fork

anon_vma_fork为完全fork一个vma中的反向映射关系,anon_vma_clone只是clone vma中的anon_vma_chain部分:

int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
{
	struct anon_vma_chain *avc;
	struct anon_vma *anon_vma;
	int error;

	/* Don't bother if the parent process has no anon_vma here. */
	if (!pvma->anon_vma)
		return 0;

	/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
	vma->anon_vma = NULL;

	/*
	 * First, attach the new VMA to the parent VMA's anon_vmas,
	 * so rmap can find non-COWed pages in child processes.
	 */
	error = anon_vma_clone(vma, pvma);
	if (error)
		return error;

	/* An existing anon_vma has been reused, all done then. */
	if (vma->anon_vma)
		return 0;

	/* Then add our own anon_vma. */
	anon_vma = anon_vma_alloc();
	if (!anon_vma)
		goto out_error;
	avc = anon_vma_chain_alloc(GFP_KERNEL);
	if (!avc)
		goto out_error_free_anon_vma;

	/*
	 * The root anon_vma's spinlock is the lock actually used when we
	 * lock any of the anon_vmas in this anon_vma tree.
	 */
	anon_vma->root = pvma->anon_vma->root;
	anon_vma->parent = pvma->anon_vma;
	/*
	 * With refcounts, an anon_vma can stay around longer than the
	 * process it belongs to. The root anon_vma needs to be pinned until
	 * this anon_vma is freed, because the lock lives in the root.
	 */
	get_anon_vma(anon_vma->root);
	/* Mark this anon_vma as the one where our new (COWed) pages go. */
	vma->anon_vma = anon_vma;
	anon_vma_lock_write(anon_vma);
	anon_vma_chain_link(vma, avc, anon_vma);
	anon_vma->parent->degree++;
	anon_vma_unlock_write(anon_vma);

	return 0;

 out_error_free_anon_vma:
	put_anon_vma(anon_vma);
 out_error:
	unlink_anon_vmas(vma);
	return -ENOMEM;
}
  •  anon_vma_clone:先clone vma中的anon_vma_chain部分。
  • anon_vma_alloc:子进程生成一个新的anon_vma
  • anon_vma_chain_alloc:子进程生成一个新的anon_vma_chain
  • anon_vma_chain_link:将生成chain插入到 rb tree中。
  • 父子进程都有自己的anon_vma和anon_vma_chain。

do_anonymous_page匿名页反向映射处理

 do_anonymous_page匿名页反向映射处理:

static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
	
    ... ...

	/* Allocate our own private page. */
	if (unlikely(anon_vma_prepare(vma)))//创建anon_vma和anon_vma_chain
		goto oom;

	page = alloc_zeroed_user_highpage_movable(vma, vmf->address);//分配物理内存
	if (!page)
		goto oom;

    ... ...
	page_add_new_anon_rmap(page, vma, vmf->address, false);//创建page与vma反向映射关系

	lru_cache_add_active_or_unevictable(page, vma);//加入LRU链表

    ... ...
}

文件page cache反向映射

如果一个page 是page cache,struct page中的mapping用作 address_space使用,因此针对page cache无法再使用mapping,只能是通过page->_mapcount计数表明该文件被同时多少个进程使用。

page_add_file_rmap

page_add_file_rmap用于:


/**
 * page_add_file_rmap - add pte mapping to a file page
 * @page: the page to add the mapping to
 * @compound: charge the page as compound or small page
 *
 * The caller needs to hold the pte lock.
 */
void page_add_file_rmap(struct page *page, bool compound)
{
	int i, nr = 1;

	VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
	lock_page_memcg(page);
	if (compound && PageTransHuge(page)) {
		for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) {
			if (atomic_inc_and_test(&page[i]._mapcount))
				nr++;
		}
		if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
			goto out;
		if (PageSwapBacked(page))
			__inc_node_page_state(page, NR_SHMEM_PMDMAPPED);
		else
			__inc_node_page_state(page, NR_FILE_PMDMAPPED);
	} else {
		if (PageTransCompound(page) && page_mapping(page)) {
			VM_WARN_ON_ONCE(!PageLocked(page));

			SetPageDoubleMap(compound_head(page));
			if (PageMlocked(page))
				clear_page_mlock(compound_head(page));
		}
		if (!atomic_inc_and_test(&page->_mapcount))
			goto out;
	}
	__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
out:
	unlock_page_memcg(page);
}
  • atomic_inc_and_test(&page->_mapcount):将age->_mapcount 加1.

反向映射使用经典场景

反向映射使用一般是结合内存回收机制一起使用,经常使用反向映射场景:

  • 判断该page 是否被引用过,需要变量该page所对应所有vma page table, 需要查看对应pte 表中P位,是否被置1。
  • 处理在内存回收过程中根据page查找所有使用的vma,如果该页面被置换出去,会将所有使用该page vma的page table进行刷新。

page_referenced

page_referenced 函数为LRU中判断一个page 是否最近被引用过,返回值为最近被使用过的所有进程数目,即进程中page table PTE P位置1(具体可以参考《linux那些事之LRU(2)》,在此只分析和反向映射相关:

int page_referenced(struct page *page,
		    int is_locked,
		    struct mem_cgroup *memcg,
		    unsigned long *vm_flags)
{
	int we_locked = 0;
	struct page_referenced_arg pra = {
		.mapcount = total_mapcount(page),
		.memcg = memcg,
	};
	struct rmap_walk_control rwc = {
		.rmap_one = page_referenced_one,
		.arg = (void *)&pra,
		.anon_lock = page_lock_anon_vma_read,
	};

	... ...
	rmap_walk(page, &rwc);

	... ...
}
  • rmap_walk:遍历page 对应所有VMA。

rmap_walk

void rmap_walk(struct page *page, struct rmap_walk_control *rwc)
{
	if (unlikely(PageKsm(page)))
		rmap_walk_ksm(page, rwc);
	else if (PageAnon(page))
		rmap_walk_anon(page, rwc, false);
	else
		rmap_walk_file(page, rwc, false);
}

主要是三种场景:

  • rmap_walk_ksm:ksm场景遍历。
  • rmap_walk_anon:该页面为匿名页面。
  • rmap_walk_file:该页面为page cache页面。

rmap_walk_anon

rmap_walk_anon该页为匿名页面:

static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
		bool locked)
{
	struct anon_vma *anon_vma;
	pgoff_t pgoff_start, pgoff_end;
	struct anon_vma_chain *avc;

	if (locked) {
		anon_vma = page_anon_vma(page);
		/* anon_vma disappear under us? */
		VM_BUG_ON_PAGE(!anon_vma, page);
	} else {
		anon_vma = rmap_walk_anon_lock(page, rwc);
	}
	if (!anon_vma)
		return;

	pgoff_start = page_to_pgoff(page);
	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
	anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
			pgoff_start, pgoff_end) {
		struct vm_area_struct *vma = avc->vma;
		unsigned long address = vma_address(page, vma);

		cond_resched();

		if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
			continue;

		if (!rwc->rmap_one(page, vma, address, rwc->arg))
			break;
		if (rwc->done && rwc->done(page))
			break;
	}

	if (!locked)
		anon_vma_unlock_read(anon_vma);
}
  • locked:是否对anon_vma已经加锁,如果为true则直接加锁,如果为false,则将page对应anon_vma加锁。
  • anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,   pgoff_start, pgoff_end):遍历anon_vma->rb_root,获得每个anon_vma_chain,该结构为page和vma桥梁。

rmap_walk_file 

页面是page cache 遍历寻找到对应所有进程vma:

static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
		bool locked)
{
	struct address_space *mapping = page_mapping(page);
	pgoff_t pgoff_start, pgoff_end;
	struct vm_area_struct *vma;

	/*
	 * The page lock not only makes sure that page->mapping cannot
	 * suddenly be NULLified by truncation, it makes sure that the
	 * structure at mapping cannot be freed and reused yet,
	 * so we can safely take mapping->i_mmap_rwsem.
	 */
	VM_BUG_ON_PAGE(!PageLocked(page), page);

	if (!mapping)
		return;

	pgoff_start = page_to_pgoff(page);
	pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
	if (!locked)
		i_mmap_lock_read(mapping);
	vma_interval_tree_foreach(vma, &mapping->i_mmap,
			pgoff_start, pgoff_end) {
		unsigned long address = vma_address(page, vma);

		cond_resched();

		if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
			continue;

		if (!rwc->rmap_one(page, vma, address, rwc->arg))
			goto done;
		if (rwc->done && rwc->done(page))
			goto done;
	}

done:
	if (!locked)
		i_mmap_unlock_read(mapping);
}
  • vma_interval_tree_foreach(vma, &mapping->i_mmap,    pgoff_start, pgoff_end):遍历mapping->i_mmap中对应所有vma,节点直接挂载的是vma结构。

try_to_unmap

try_to_unmap为在内存回收时,回收该页中需要刷新该page对应所有vma

bool try_to_unmap(struct page *page, enum ttu_flags flags)
{
	struct rmap_walk_control rwc = {
		.rmap_one = try_to_unmap_one,
		.arg = (void *)flags,
		.done = page_mapcount_is_zero,
		.anon_lock = page_lock_anon_vma_read,
	};

	/*
	 * During exec, a temporary VMA is setup and later moved.
	 * The VMA is moved under the anon_vma lock but not the
	 * page tables leading to a race where migration cannot
	 * find the migration ptes. Rather than increasing the
	 * locking requirements of exec(), migration skips
	 * temporary VMAs until after exec() completes.
	 */
	if ((flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))
	    && !PageKsm(page) && PageAnon(page))
		rwc.invalid_vma = invalid_migration_vma;

	if (flags & TTU_RMAP_LOCKED)
		rmap_walk_locked(page, &rwc);
	else
		rmap_walk(page, &rwc);

	return !page_mapcount(page) ? true : false;
}
  • 遍历反向映射同样采用rmap_walk机制
  • rmap_one 方法对应 unmap掉反向映射try_to_unmap_one

try_to_unmap_one

try_to_unmap_one删除掉该page 对应的一个vma,该函数实现长,主要是需要处理各种page使用场景,:


/*
 * @arg: enum ttu_flags will be passed to this argument
 */
static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
		     unsigned long address, void *arg)
{

    ... ...

    set_pte_at(mm, address, pvmw.pte, pteval)
	
   
    mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE);

    ... 

	page_remove_rmap(subpage, PageHuge(page));
	put_page(page);

	mmu_notifier_invalidate_range_end(&range);

	return ret;
}

主要将其抽象成几步:

  •  set_pte_at:刷新对应进程的PTE
  • mmu_notifier_invalidate_range:通知MMU 进行刷新。
  • page_remove_rma:将该进程vma映射关系从page 反向映射中删除。
  • put_page: struct page中的计数减一。

内存回收中去映射

shrink_page_list 进行内存回收时,会将选中的页面通过映射进行去映射处理:

static unsigned int shrink_page_list(struct list_head *page_list,
				     struct pglist_data *pgdat,
				     struct scan_control *sc,
				     enum ttu_flags ttu_flags,
				     struct reclaim_stat *stat,
				     bool ignore_references)
{
    ... ...
	while (!list_empty(page_list)) {
       
        ... ...

		/*
		 * The page is mapped into the page tables of one or more
		 * processes. Try to unmap it here.
		 */
		if (page_mapped(page)) {
			enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
			bool was_swapbacked = PageSwapBacked(page);

			if (unlikely(PageTransHuge(page)))
				flags |= TTU_SPLIT_HUGE_PMD;

			if (!try_to_unmap(page, flags)) {//去映射处理
				stat->nr_unmap_fail += nr_pages;
				if (!was_swapbacked && PageSwapBacked(page))
					stat->nr_lazyfree_fail += nr_pages;
				goto activate_locked;
			}
		}

       ... ...

    }

    ... ...
	return nr_reclaimed;
}
举报

相关推荐

0 条评论