内核中经常出现一个page 被多个虚拟进程使用,例如fork是子进程会继承父进程的页面,同样在开始时拥有相同的页面,此时一个page会对应多个进程VMA:
在共享内存场景中,一个物理内存也会被多个进程同时使用。针对上述场景,如果是一个文件被多个进程使用,实际上只对应一个物理页,此时page为page cache,page结构中可以通过page中的struct address_space *mapping结构进行管理。
而针对匿名页由于没有对应具体文件,但是又经常需要用到根据page查找到对应的所有正在使用的进程vma,该查找过程被称为反向映射(RMAP).
匿名页反向映射管理数据结构
匿名页反向映射用于记录page与VMA 1:N关系;其结构如下:
- struct anon_vma 结构为基于全局相当于page cache中的address_space结构,用于记录page 对应多少个进程的VMA,通过anon_vma_chain这个桥梁记录对应多少个进程。
- struct page中的mapping 为一个复用结构成员,当为匿名页是用来记录对应的struct anon_vma结构。
- struct anon_vma_chain:为anon_vma与vma桥梁,struct anon_vma中rb_root为记录多少anon_vma_chain结构,每个anon_vma_chain内对应相同vma。struct anon_vma一般每个进程为一个rb_root中的节点,如果是子进程继承父进程中页面,则子进程中也会创建复制anon_vma_chain。
- struct vm_area_struct中的anon_vma成员指向对应struct anon_vma。struct list_head anon_vma_chain对 anon_vma_chain链表。
反向映射设计数据结构设计要求:
- 能够根据struct page能够快速找到所对应进程的vma,能够快速到查找到vma中的page table,故采用rb_root用于快速反向查找。
- 能够快速根据其中进程的一个vma,查找到该vma所对应的page 中其他进程vma 使用该page,故struct_vma_are_struct中设计有struct list_head anon_vma_chain双向链表。
struct page与反向映射相关成员
struct page与反向映射相关成员:
struct page {
unsigned long flags; /* Atomic flags, some possibly updated asynchronously */
union {
struct { /* Page cache and anonymous pages */
/**
* @lru: Pageout list, eg. active_list protected by
* pgdat->lru_lock. Sometimes used as a generic list
* by the page owner.
*/
struct list_head lru;
/* See page-flags.h for PAGE_MAPPING_FLAGS */
struct address_space *mapping;
pgoff_t index; /* Our offset within mapping. */
... ...
};
};
... ...
union { /* This union is 4 bytes in size. */
/*
* If the page can be mapped to userspace, encodes the number
* of times this page is referenced by a page table.
*/
atomic_t _mapcount;
... ...
};
... ...
} _struct_page_alignment;
struct page在设计之初为了节约所占有的物理内存空间,很多成员都采用了复用技术, 一个成员结构可能有多种不同使用场景,在匿名页反向映射中也是同样如此:
- struct address_space *mapping: 当page 为page cache时,mapping指向的是物理内存所对应具体文件在磁盘中位置。当为匿名映射时,mapping对应的是struct anon_vma 结构,用来记录page和vma对应关系(struct page mapping用于解决根据一个page查找所有vma问题)。
- pgoff_t index:当为page cache记录的是在文件中偏移位置。当为匿名映射时对应的时该page在vma中偏移位置。
- atomic_t _mapcount:用来记录该page被多少个进程所共享使用。
struct anon_vma
反向映射管理数据结构,意思是匿名页对应vma,用于记录匿名页记录对应多少个vma:
struct anon_vma {
struct anon_vma *root; /* Root of this anon_vma tree */
struct rw_semaphore rwsem; /* W: modification, R: walking the list */
/*
* The refcount is taken on an anon_vma when there is no
* guarantee that the vma of page tables will exist for
* the duration of the operation. A caller that takes
* the reference is responsible for clearing up the
* anon_vma if they are the last user on release
*/
atomic_t refcount;
/*
* Count of child anon_vmas and VMAs which points to this anon_vma.
*
* This counter is used for making decision about reusing anon_vma
* instead of forking new one. See comments in function anon_vma_clone.
*/
unsigned degree;
struct anon_vma *parent; /* Parent of this anon_vma */
/*
* NOTE: the LSB of the rb_root.rb_node is set by
* mm_take_all_locks() _after_ taking the above lock. So the
* rb_root must only be read/written after taking the above lock
* to be sure to see a valid next pointer. The LSB bit itself
* is serialized by a system wide lock only visible to
* mm_take_all_locks() (mm_all_locks_mutex).
*/
/* Interval tree of private "related" vmas */
struct rb_root_cached rb_root;
};
- struct anon_vma *root: 对应anon_vma中rb_root 红黑树中根节点对应anon_vma。
- struct rw_semaphore rwsem:信号量,用于对该结构加锁。
- atomic_t refcount:引用计数,表明有多少个活跃进程page table在使用该映射。
- unsigned degree:该页记录对应多少个进程vma.
- struct anon_vma *parent:指向该节点在rb_root中的父节点。
- struct rb_root_cached rb_root:红黑树,每个节点对应一个struct anon_vma_chain,一般一个AVC(anon_vma_chain)结构对应一个进程,用于记录vma与page对应关系。
struct anon_vma_chain
struct anon_vma_chain 结构记录anon_vma与vma对应关系:
struct anon_vma_chain {
struct vm_area_struct *vma;
struct anon_vma *anon_vma;
struct list_head same_vma; /* locked by mmap_lock & page_table_lock */
struct rb_node rb; /* locked by anon_vma->rwsem */
unsigned long rb_subtree_last;
#ifdef CONFIG_DEBUG_VM_RB
unsigned long cached_vma_start, cached_vma_last;
#endif
- struct vm_area_struct *vma:记录所对应vma。
- struct anon_vma *anon_vma:所对应anon_vma。
- struct list_head same_vma:记录相同vma的进程链表,对应vm_area_struct中的anon_vma_chain,会把相同vma的进程给双向链表串联起来。
- struct rb_node rb: anon_vma中的rb_root。
- unsigned long cached_vma_start, cached_vma_last:用于调试,vma起始和结束。需要使用CONFIG_DEBUG_VM_RB开启。
anon_vma_chain结构即作为一个双向链表结构成员,也作为红黑树上一个节点,满足不同需求。
struct vm_area_struct 与反向映射相关成员
struct vma_are_struct用于解决根据一个vma 查找对应page中其他进程vma问题:
struct vm_area_struct {
... ...
/*
* A file's MAP_PRIVATE vma can be in both i_mmap tree and anon_vma
* list, after a COW of one of the file pages. A MAP_SHARED vma
* can only be in the i_mmap tree. An anonymous MAP_PRIVATE, stack
* or brk vma (with NULL file) can only be in an anon_vma list.
*/
struct list_head anon_vma_chain; /* Serialized by mmap_lock &
* page_table_lock */
struct anon_vma *anon_vma; /* Serialized by page_table_lock */
... ...
} __randomize_layout;
- struct list_head anon_vma_chain:用于挂载所属于相同VMA进程在处于同一个链表,方便快速查找。
- struct anon_vma *anon_vma:该vma所对应anon_vma。
page cache反向映射管理数据结构
page cache反向映射管理与匿名页稍微不同,主要是page mapping用来指向address_space:
- 与匿名页page 相比,page cache匿名页相对比较简单,mapping指向的是struct address_space结构(《linux那些事之page cache》),
- struct address_space 结构中 i_mmap成员为红黑树结构,每个节点直接存储一个进程VMA,表明一个进程访问该文件,对应该进程vma.
struct address_space
struct address_space 与反向映射相关数据结构:
struct address_space {
... ...
struct rb_root_cached i_mmap;
struct rw_semaphore i_mmap_rwsem;
... ...
} __attribute__((aligned(sizeof(long)))) __randomize_layout;
/*
- struct rb_root_cached i_mmap:记录所有访问该文件的VMA
- struct rw_semaphore i_mmap_rwsem:i_mmap锁。
匿名页anon_vma常用操作
anon_vma_prepare
anon_vma_prepare函数主要是为匿名反向映射做准备,创建anon_vma与vma映射关系:
/**
* __anon_vma_prepare - attach an anon_vma to a memory region
* @vma: the memory region in question
*
* This makes sure the memory mapping described by 'vma' has
* an 'anon_vma' attached to it, so that we can associate the
* anonymous pages mapped into it with that anon_vma.
*
* The common case will be that we already have one, which
* is handled inline by anon_vma_prepare(). But if
* not we either need to find an adjacent mapping that we
* can re-use the anon_vma from (very common when the only
* reason for splitting a vma has been mprotect()), or we
* allocate a new one.
*
* Anon-vma allocations are very subtle, because we may have
* optimistically looked up an anon_vma in page_lock_anon_vma_read()
* and that may actually touch the spinlock even in the newly
* allocated vma (it depends on RCU to make sure that the
* anon_vma isn't actually destroyed).
*
* As a result, we need to do proper anon_vma locking even
* for the new allocation. At the same time, we do not want
* to do any locking for the common case of already having
* an anon_vma.
*
* This must be called with the mmap_lock held for reading.
*/
int __anon_vma_prepare(struct vm_area_struct *vma)
{
struct mm_struct *mm = vma->vm_mm;
struct anon_vma *anon_vma, *allocated;
struct anon_vma_chain *avc;
might_sleep();
avc = anon_vma_chain_alloc(GFP_KERNEL);
if (!avc)
goto out_enomem;
anon_vma = find_mergeable_anon_vma(vma);
allocated = NULL;
if (!anon_vma) {
anon_vma = anon_vma_alloc();
if (unlikely(!anon_vma))
goto out_enomem_free_avc;
allocated = anon_vma;
}
anon_vma_lock_write(anon_vma);
/* page_table_lock to protect against threads */
spin_lock(&mm->page_table_lock);
if (likely(!vma->anon_vma)) {
vma->anon_vma = anon_vma;
anon_vma_chain_link(vma, avc, anon_vma);
/* vma reference or self-parent link for new root */
anon_vma->degree++;
allocated = NULL;
avc = NULL;
}
spin_unlock(&mm->page_table_lock);
anon_vma_unlock_write(anon_vma);
if (unlikely(allocated))
put_anon_vma(allocated);
if (unlikely(avc))
anon_vma_chain_free(avc);
return 0;
out_enomem_free_avc:
anon_vma_chain_free(avc);
out_enomem:
return -ENOMEM;
}
- 生成一个anon_vma_chain结构,用于创建vma与anon_vma映射管理。
- find_mergeable_anon_vma:查看是否能够复用相邻vma所属anon_vma结构。
- 如果不能复用,调用anon_vma_alloc创建一个新anon_vma结构
- anon_vma_lock_write(anon_vma):对anon_vma解锁,后续需要操作。
- 加锁mm->page_table_lock。
- 将刚申请的anon_vma结构,赋值到vma->anon_vma。
- anon_vma_chain_link:创建vma aon_vma及anon_vma_chain直接管理关系
- anon_vma->degree++:说明红黑树添加一个。
- mm->page_table_lock:解锁
- anon_vma_unlock_write(anon_vma):解锁
anon_vma_chain_link
anon_vma_chain_link创建vma aon_vma及anon_vma_chain三者直接关系:
static void anon_vma_chain_link(struct vm_area_struct *vma,
struct anon_vma_chain *avc,
struct anon_vma *anon_vma)
{
avc->vma = vma;
avc->anon_vma = anon_vma;
list_add(&avc->same_vma, &vma->anon_vma_chain);
anon_vma_interval_tree_insert(avc, &anon_vma->rb_root);
}
- avc->vma = vma;avc->anon_vma = anon_vma,构建anon_vma_chain 关键关系(vma,anon_vma)两者对应关系。
- list_add(&avc->same_vma, &vma->anon_vma_chain): 将vma->anon_vma_chain双向链表加入avc->same_vma,意思是vma->anon_vma_chain中的链表所有vma相同。
- anon_vma_interval_tree_insert(avc, &anon_vma->rb_root): 将avc加入到anon_vma->rb_root红黑树中。
anon_vma_alloc
anon_vma_alloc用于通过slab 申请一个新的anon_vma:
static inline struct anon_vma *anon_vma_alloc(void)
{
struct anon_vma *anon_vma;
anon_vma = kmem_cache_alloc(anon_vma_cachep, GFP_KERNEL);
if (anon_vma) {
atomic_set(&anon_vma->refcount, 1);
anon_vma->degree = 1; /* Reference for first vma */
anon_vma->parent = anon_vma;
/*
* Initialise the anon_vma root to point to itself. If called
* from fork, the root will be reset to the parents anon_vma.
*/
anon_vma->root = anon_vma;
}
return anon_vma;
}
anon_vma_chain_alloc
anon_vma_chain_alloc申请一个新的anon_vma_chain:
static inline struct anon_vma_chain *anon_vma_chain_alloc(gfp_t gfp)
{
return kmem_cache_alloc(anon_vma_chain_cachep, gfp);
}
page_add_new_anon_rmap
page_add_new_anon_rmap将一个物理页page 与anon_vma以及vma建立反向映射关系:
/**
* page_add_new_anon_rmap - add pte mapping to a new anonymous page
* @page: the page to add the mapping to
* @vma: the vm area in which the mapping is added
* @address: the user virtual address mapped
* @compound: charge the page as compound or small page
*
* Same as page_add_anon_rmap but must only be called on *new* pages.
* This means the inc-and-test can be bypassed.
* Page does not have to be locked.
*/
void page_add_new_anon_rmap(struct page *page,
struct vm_area_struct *vma, unsigned long address, bool compound)
{
int nr = compound ? hpage_nr_pages(page) : 1;
VM_BUG_ON_VMA(address < vma->vm_start || address >= vma->vm_end, vma);
__SetPageSwapBacked(page);
if (compound) {
VM_BUG_ON_PAGE(!PageTransHuge(page), page);
/* increment count (starts at -1) */
atomic_set(compound_mapcount_ptr(page), 0);
if (hpage_pincount_available(page))
atomic_set(compound_pincount_ptr(page), 0);
__inc_lruvec_page_state(page, NR_ANON_THPS);
} else {
/* Anon THP always mapped first with PMD */
VM_BUG_ON_PAGE(PageTransCompound(page), page);
/* increment count (starts at -1) */
atomic_set(&page->_mapcount, 0);
}
__mod_lruvec_page_state(page, NR_ANON_MAPPED, nr);
__page_set_anon_rmap(page, vma, address, 1);
}
- __SetPageSwapBacked:将匿名页中标记位设置PG_swapbacked标记位,后续如果被回收该page中的内容将会被刷新到swap分区。
- 更新设置page->_mapcount中的计数为0,需要考虑compound page场景。
- __mod_lruvec_page_state:更新lru统计信息。
- __page_set_anon_rmap: 设置page 、vma之间映射关系。
__page_set_anon_rmap
__page_set_anon_rmap设置page 、vma之间映射关系:
/**
* __page_set_anon_rmap - set up new anonymous rmap
* @page: Page or Hugepage to add to rmap
* @vma: VM area to add page to.
* @address: User virtual address of the mapping
* @exclusive: the page is exclusively owned by the current process
*/
static void __page_set_anon_rmap(struct page *page,
struct vm_area_struct *vma, unsigned long address, int exclusive)
{
struct anon_vma *anon_vma = vma->anon_vma;
BUG_ON(!anon_vma);
if (PageAnon(page))
return;
/*
* If the page isn't exclusively mapped into this vma,
* we must use the _oldest_ possible anon_vma for the
* page mapping!
*/
if (!exclusive)
anon_vma = anon_vma->root;
anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON;
page->mapping = (struct address_space *) anon_vma;
page->index = linear_page_index(vma, address);
}
- anon_vma = (void *) anon_vma + PAGE_MAPPING_ANON: 将anon_vma地址加上PAGE_MAPPING_ANON标记位。
- page->mapping = (struct address_space *) anon_vma: page mapping复用,匿名页是用于指向anon_vma,作为记录反向映射关系。
- page->index = linear_page_index(vma, address): 更新index,此时index表示的是该页 对应vma中的偏移,计算方式如下:
anon_vma_clone
anon_vma_clone通常用于父子进程中,将父进程中vma中的anon_vma_chain完全复制一份新的anon_vma_chain链表管理到子进程vma中:
- 如上图所示,将父进程vma中的anon_vma_chain所有对应的chain复制一份到子进程中。
- 父子进程中的anon_vma_chain中的所有chain全部生成新的,但是每个chain中包含的anon_vma指向的都是相同节点。
- 父子进程中的chain每个节点anon_vma相同,但是vma不同分别对应各自父子进程vma。
int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
{
struct anon_vma_chain *avc, *pavc;
struct anon_vma *root = NULL;
list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
struct anon_vma *anon_vma;
avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
if (unlikely(!avc)) {
unlock_anon_vma_root(root);
root = NULL;
avc = anon_vma_chain_alloc(GFP_KERNEL);
if (!avc)
goto enomem_failure;
}
anon_vma = pavc->anon_vma;
root = lock_anon_vma_root(root, anon_vma);
anon_vma_chain_link(dst, avc, anon_vma);
/*
* Reuse existing anon_vma if its degree lower than two,
* that means it has no vma and only one anon_vma child.
*
* Do not chose parent anon_vma, otherwise first child
* will always reuse it. Root anon_vma is never reused:
* it has self-parent reference and at least one child.
*/
if (!dst->anon_vma && src->anon_vma &&
anon_vma != src->anon_vma && anon_vma->degree < 2)
dst->anon_vma = anon_vma;
}
if (dst->anon_vma)
dst->anon_vma->degree++;
unlock_anon_vma_root(root);
return 0;
enomem_failure:
/*
* dst->anon_vma is dropped here otherwise its degree can be incorrectly
* decremented in unlink_anon_vmas().
* We can safely do this because callers of anon_vma_clone() don't care
* about dst->anon_vma if anon_vma_clone() failed.
*/
dst->anon_vma = NULL;
unlink_anon_vmas(dst);
return -ENOMEM;
}
- 遍历src中的anon_vma_chain所有chain.
- dst vma中申请新的chain:avc = anon_vma_chain_alloc(GFP_NOWAIT | __GFP_NOWARN);
- 生成新的chain中的anon_vma 采用继承,vma使用父子进程各自vma
- anon_vma_chain_link:将新生成的节点插入到dst anon_vma_chain中。
anon_vma_fork
anon_vma_fork为完全fork一个vma中的反向映射关系,anon_vma_clone只是clone vma中的anon_vma_chain部分:
int anon_vma_fork(struct vm_area_struct *vma, struct vm_area_struct *pvma)
{
struct anon_vma_chain *avc;
struct anon_vma *anon_vma;
int error;
/* Don't bother if the parent process has no anon_vma here. */
if (!pvma->anon_vma)
return 0;
/* Drop inherited anon_vma, we'll reuse existing or allocate new. */
vma->anon_vma = NULL;
/*
* First, attach the new VMA to the parent VMA's anon_vmas,
* so rmap can find non-COWed pages in child processes.
*/
error = anon_vma_clone(vma, pvma);
if (error)
return error;
/* An existing anon_vma has been reused, all done then. */
if (vma->anon_vma)
return 0;
/* Then add our own anon_vma. */
anon_vma = anon_vma_alloc();
if (!anon_vma)
goto out_error;
avc = anon_vma_chain_alloc(GFP_KERNEL);
if (!avc)
goto out_error_free_anon_vma;
/*
* The root anon_vma's spinlock is the lock actually used when we
* lock any of the anon_vmas in this anon_vma tree.
*/
anon_vma->root = pvma->anon_vma->root;
anon_vma->parent = pvma->anon_vma;
/*
* With refcounts, an anon_vma can stay around longer than the
* process it belongs to. The root anon_vma needs to be pinned until
* this anon_vma is freed, because the lock lives in the root.
*/
get_anon_vma(anon_vma->root);
/* Mark this anon_vma as the one where our new (COWed) pages go. */
vma->anon_vma = anon_vma;
anon_vma_lock_write(anon_vma);
anon_vma_chain_link(vma, avc, anon_vma);
anon_vma->parent->degree++;
anon_vma_unlock_write(anon_vma);
return 0;
out_error_free_anon_vma:
put_anon_vma(anon_vma);
out_error:
unlink_anon_vmas(vma);
return -ENOMEM;
}
- anon_vma_clone:先clone vma中的anon_vma_chain部分。
- anon_vma_alloc:子进程生成一个新的anon_vma
- anon_vma_chain_alloc:子进程生成一个新的anon_vma_chain
- anon_vma_chain_link:将生成chain插入到 rb tree中。
- 父子进程都有自己的anon_vma和anon_vma_chain。
do_anonymous_page匿名页反向映射处理
do_anonymous_page匿名页反向映射处理:
static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
{
... ...
/* Allocate our own private page. */
if (unlikely(anon_vma_prepare(vma)))//创建anon_vma和anon_vma_chain
goto oom;
page = alloc_zeroed_user_highpage_movable(vma, vmf->address);//分配物理内存
if (!page)
goto oom;
... ...
page_add_new_anon_rmap(page, vma, vmf->address, false);//创建page与vma反向映射关系
lru_cache_add_active_or_unevictable(page, vma);//加入LRU链表
... ...
}
文件page cache反向映射
如果一个page 是page cache,struct page中的mapping用作 address_space使用,因此针对page cache无法再使用mapping,只能是通过page->_mapcount计数表明该文件被同时多少个进程使用。
page_add_file_rmap
page_add_file_rmap用于:
/**
* page_add_file_rmap - add pte mapping to a file page
* @page: the page to add the mapping to
* @compound: charge the page as compound or small page
*
* The caller needs to hold the pte lock.
*/
void page_add_file_rmap(struct page *page, bool compound)
{
int i, nr = 1;
VM_BUG_ON_PAGE(compound && !PageTransHuge(page), page);
lock_page_memcg(page);
if (compound && PageTransHuge(page)) {
for (i = 0, nr = 0; i < HPAGE_PMD_NR; i++) {
if (atomic_inc_and_test(&page[i]._mapcount))
nr++;
}
if (!atomic_inc_and_test(compound_mapcount_ptr(page)))
goto out;
if (PageSwapBacked(page))
__inc_node_page_state(page, NR_SHMEM_PMDMAPPED);
else
__inc_node_page_state(page, NR_FILE_PMDMAPPED);
} else {
if (PageTransCompound(page) && page_mapping(page)) {
VM_WARN_ON_ONCE(!PageLocked(page));
SetPageDoubleMap(compound_head(page));
if (PageMlocked(page))
clear_page_mlock(compound_head(page));
}
if (!atomic_inc_and_test(&page->_mapcount))
goto out;
}
__mod_lruvec_page_state(page, NR_FILE_MAPPED, nr);
out:
unlock_page_memcg(page);
}
- atomic_inc_and_test(&page->_mapcount):将age->_mapcount 加1.
反向映射使用经典场景
反向映射使用一般是结合内存回收机制一起使用,经常使用反向映射场景:
- 判断该page 是否被引用过,需要变量该page所对应所有vma page table, 需要查看对应pte 表中P位,是否被置1。
- 处理在内存回收过程中根据page查找所有使用的vma,如果该页面被置换出去,会将所有使用该page vma的page table进行刷新。
page_referenced
page_referenced 函数为LRU中判断一个page 是否最近被引用过,返回值为最近被使用过的所有进程数目,即进程中page table PTE P位置1(具体可以参考《linux那些事之LRU(2)》,在此只分析和反向映射相关:
int page_referenced(struct page *page,
int is_locked,
struct mem_cgroup *memcg,
unsigned long *vm_flags)
{
int we_locked = 0;
struct page_referenced_arg pra = {
.mapcount = total_mapcount(page),
.memcg = memcg,
};
struct rmap_walk_control rwc = {
.rmap_one = page_referenced_one,
.arg = (void *)&pra,
.anon_lock = page_lock_anon_vma_read,
};
... ...
rmap_walk(page, &rwc);
... ...
}
- rmap_walk:遍历page 对应所有VMA。
rmap_walk
void rmap_walk(struct page *page, struct rmap_walk_control *rwc)
{
if (unlikely(PageKsm(page)))
rmap_walk_ksm(page, rwc);
else if (PageAnon(page))
rmap_walk_anon(page, rwc, false);
else
rmap_walk_file(page, rwc, false);
}
主要是三种场景:
- rmap_walk_ksm:ksm场景遍历。
- rmap_walk_anon:该页面为匿名页面。
- rmap_walk_file:该页面为page cache页面。
rmap_walk_anon
rmap_walk_anon该页为匿名页面:
static void rmap_walk_anon(struct page *page, struct rmap_walk_control *rwc,
bool locked)
{
struct anon_vma *anon_vma;
pgoff_t pgoff_start, pgoff_end;
struct anon_vma_chain *avc;
if (locked) {
anon_vma = page_anon_vma(page);
/* anon_vma disappear under us? */
VM_BUG_ON_PAGE(!anon_vma, page);
} else {
anon_vma = rmap_walk_anon_lock(page, rwc);
}
if (!anon_vma)
return;
pgoff_start = page_to_pgoff(page);
pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root,
pgoff_start, pgoff_end) {
struct vm_area_struct *vma = avc->vma;
unsigned long address = vma_address(page, vma);
cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
continue;
if (!rwc->rmap_one(page, vma, address, rwc->arg))
break;
if (rwc->done && rwc->done(page))
break;
}
if (!locked)
anon_vma_unlock_read(anon_vma);
}
- locked:是否对anon_vma已经加锁,如果为true则直接加锁,如果为false,则将page对应anon_vma加锁。
- anon_vma_interval_tree_foreach(avc, &anon_vma->rb_root, pgoff_start, pgoff_end):遍历anon_vma->rb_root,获得每个anon_vma_chain,该结构为page和vma桥梁。
rmap_walk_file
页面是page cache 遍历寻找到对应所有进程vma:
static void rmap_walk_file(struct page *page, struct rmap_walk_control *rwc,
bool locked)
{
struct address_space *mapping = page_mapping(page);
pgoff_t pgoff_start, pgoff_end;
struct vm_area_struct *vma;
/*
* The page lock not only makes sure that page->mapping cannot
* suddenly be NULLified by truncation, it makes sure that the
* structure at mapping cannot be freed and reused yet,
* so we can safely take mapping->i_mmap_rwsem.
*/
VM_BUG_ON_PAGE(!PageLocked(page), page);
if (!mapping)
return;
pgoff_start = page_to_pgoff(page);
pgoff_end = pgoff_start + hpage_nr_pages(page) - 1;
if (!locked)
i_mmap_lock_read(mapping);
vma_interval_tree_foreach(vma, &mapping->i_mmap,
pgoff_start, pgoff_end) {
unsigned long address = vma_address(page, vma);
cond_resched();
if (rwc->invalid_vma && rwc->invalid_vma(vma, rwc->arg))
continue;
if (!rwc->rmap_one(page, vma, address, rwc->arg))
goto done;
if (rwc->done && rwc->done(page))
goto done;
}
done:
if (!locked)
i_mmap_unlock_read(mapping);
}
- vma_interval_tree_foreach(vma, &mapping->i_mmap, pgoff_start, pgoff_end):遍历mapping->i_mmap中对应所有vma,节点直接挂载的是vma结构。
try_to_unmap
try_to_unmap为在内存回收时,回收该页中需要刷新该page对应所有vma
bool try_to_unmap(struct page *page, enum ttu_flags flags)
{
struct rmap_walk_control rwc = {
.rmap_one = try_to_unmap_one,
.arg = (void *)flags,
.done = page_mapcount_is_zero,
.anon_lock = page_lock_anon_vma_read,
};
/*
* During exec, a temporary VMA is setup and later moved.
* The VMA is moved under the anon_vma lock but not the
* page tables leading to a race where migration cannot
* find the migration ptes. Rather than increasing the
* locking requirements of exec(), migration skips
* temporary VMAs until after exec() completes.
*/
if ((flags & (TTU_MIGRATION|TTU_SPLIT_FREEZE))
&& !PageKsm(page) && PageAnon(page))
rwc.invalid_vma = invalid_migration_vma;
if (flags & TTU_RMAP_LOCKED)
rmap_walk_locked(page, &rwc);
else
rmap_walk(page, &rwc);
return !page_mapcount(page) ? true : false;
}
- 遍历反向映射同样采用rmap_walk机制
- rmap_one 方法对应 unmap掉反向映射try_to_unmap_one
try_to_unmap_one
try_to_unmap_one删除掉该page 对应的一个vma,该函数实现长,主要是需要处理各种page使用场景,:
/*
* @arg: enum ttu_flags will be passed to this argument
*/
static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma,
unsigned long address, void *arg)
{
... ...
set_pte_at(mm, address, pvmw.pte, pteval)
mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE);
...
page_remove_rmap(subpage, PageHuge(page));
put_page(page);
mmu_notifier_invalidate_range_end(&range);
return ret;
}
主要将其抽象成几步:
- set_pte_at:刷新对应进程的PTE
- mmu_notifier_invalidate_range:通知MMU 进行刷新。
- page_remove_rma:将该进程vma映射关系从page 反向映射中删除。
- put_page: struct page中的计数减一。
内存回收中去映射
shrink_page_list 进行内存回收时,会将选中的页面通过映射进行去映射处理:
static unsigned int shrink_page_list(struct list_head *page_list,
struct pglist_data *pgdat,
struct scan_control *sc,
enum ttu_flags ttu_flags,
struct reclaim_stat *stat,
bool ignore_references)
{
... ...
while (!list_empty(page_list)) {
... ...
/*
* The page is mapped into the page tables of one or more
* processes. Try to unmap it here.
*/
if (page_mapped(page)) {
enum ttu_flags flags = ttu_flags | TTU_BATCH_FLUSH;
bool was_swapbacked = PageSwapBacked(page);
if (unlikely(PageTransHuge(page)))
flags |= TTU_SPLIT_HUGE_PMD;
if (!try_to_unmap(page, flags)) {//去映射处理
stat->nr_unmap_fail += nr_pages;
if (!was_swapbacked && PageSwapBacked(page))
stat->nr_lazyfree_fail += nr_pages;
goto activate_locked;
}
}
... ...
}
... ...
return nr_reclaimed;
}