继<linux内核那些事之buddy(anti-fragment机制)(4)>,在同一个zone内指定的migrate type中没有足够内存,会启动fallback机制,从fallbacks数组中寻找到合适其他type中获取到steal page,实施steal page核心处理函数为steal_suitable_fallback。
steal_suitable_fallback
steal_suitable_fallback定义如下:
void steal_suitable_fallback(struct zone *zone, struct page *page,
unsigned int alloc_flags, int start_type, bool whole_block)
函数功能:
- 实施steal page核心功能函数:steal 页面时是否需要修改page block migrate type属性。当order足够大时,会一次性将整个pageblock迁移过来,同时修改page block migratetype 。当只能steal pageblock中的一部分内存中,则并不修改page block migratetype意味着当前page block处于compatible migratetype 即一部分被其他migrateype使用。逻辑大概如下:
参数:
- struct zone *zone: 所申请page位于zone.
- struct page *page:所要开始steal page物理页。
- unsigned int alloc_flags:申请内存使用的alloc flags。
- int start_type:申请内存所指定的migrate type。
- bool whole_block: 是否steal 整个page block。
steal_suitable_fallback流程
steal_suitable_fallback处理流程如下:
steal_suitable_fallback源码
结合steal_suitable_fallback源码分析:
/*
* This function implements actual steal behaviour. If order is large enough,
* we can steal whole pageblock. If not, we first move freepages in this
* pageblock to our migratetype and determine how many already-allocated pages
* are there in the pageblock with a compatible migratetype. If at least half
* of pages are free or compatible, we can change migratetype of the pageblock
* itself, so pages freed in the future will be put on the correct free list.
*/
static void steal_suitable_fallback(struct zone *zone, struct page *page,
unsigned int alloc_flags, int start_type, bool whole_block)
{
unsigned int current_order = page_order(page);
int free_pages, movable_pages, alike_pages;
int old_block_type;
old_block_type = get_pageblock_migratetype(page);
/*
* This can happen due to races and we want to prevent broken
* highatomic accounting.
*/
if (is_migrate_highatomic(old_block_type))
goto single_page;
/* Take ownership for orders >= pageblock_order */
if (current_order >= pageblock_order) {
change_pageblock_range(page, current_order, start_type);
goto single_page;
}
/*
* Boost watermarks to increase reclaim pressure to reduce the
* likelihood of future fallbacks. Wake kswapd now as the node
* may be balanced overall and kswapd will not wake naturally.
*/
boost_watermark(zone);
if (alloc_flags & ALLOC_KSWAPD)
set_bit(ZONE_BOOSTED_WATERMARK, &zone->flags);
/* We are not allowed to try stealing from the whole block */
if (!whole_block)
goto single_page;
free_pages = move_freepages_block(zone, page, start_type,
&movable_pages);
/*
* Determine how many pages are compatible with our allocation.
* For movable allocation, it's the number of movable pages which
* we just obtained. For other types it's a bit more tricky.
*/
if (start_type == MIGRATE_MOVABLE) {
alike_pages = movable_pages;
} else {
/*
* If we are falling back a RECLAIMABLE or UNMOVABLE allocation
* to MOVABLE pageblock, consider all non-movable pages as
* compatible. If it's UNMOVABLE falling back to RECLAIMABLE or
* vice versa, be conservative since we can't distinguish the
* exact migratetype of non-movable pages.
*/
if (old_block_type == MIGRATE_MOVABLE)
alike_pages = pageblock_nr_pages
- (free_pages + movable_pages);
else
alike_pages = 0;
}
/* moving whole block can fail due to zone boundary conditions */
if (!free_pages)
goto single_page;
/*
* If a sufficient number of pages in the block are either free or of
* comparable migratability as our allocation, claim the whole block.
*/
if (free_pages + alike_pages >= (1 << (pageblock_order-1)) ||
page_group_by_mobility_disabled)
set_pageblock_migratetype(page, start_type);
return;
single_page:
move_to_free_list(page, zone, current_order, start_type);
}
- 根据要steal(也可以称为迁移)的page获取到对应page block的migrate type(迁移属性)。
- 如果page迁移属性为MIGRATE_HIGHATOMIC,则说明order为0,不做paga block迁移属性,直接调用move_to_free_list将页面迁移到对应freelist中即可。
- 如果steal page的current_order大于等于pageblock_order,则说明要迁移的page 至少要大于一个page block,直接调用change_pageblock_range,修改对应page block迁移属性,并将要迁移页面move_to_free_list将页面迁移到对应freelist。
- 如果上述情况都不是,需要进一步判断是否可以修改page block migrate type。
- 首先修改zone boost water mark,决定kswapd回收内存尺度。
- 如果alloc_flags设置ALLOC_KSWAPD,则内存发生迁移因为内存不足,可以提前触发KSWAPD线程进行内存规整等操作以便提前整理空闲物理内存
- whole_block如果为false,则说明不能做修改整个page block迁移属性,只将做页面迁移不做属性迁移。
- move_freepages_block: 将指定空闲页进行页迁移,当迁移的空闲页数量和alike_pages大于>=pageblock_order,说明进行的时整个page block,需要修改其page block迁移属性。
- move_to_free_list:页面迁移,将页面从旧的migrate type中的free list迁移到新的migrate type free list中,后续新的migrate type中将有足够内存用于此次内存申请。
允许修改/迁移page block migrate type准则
由steal_suitable_fallback流程可以得出允许修改/迁移page block migrate type准则:
- page 所对应oder 直接大于或者等于pageblock_order,允许做page block迁移属性修改。
- 当page 对应小于pageblock_order时,whole_block为false 说明不允许修改page block migrate type。
- 当page 对应小于pageblock_order时,whole_block为true是,需要判断page block原有属性为MIGRATE_MOVABLE,则说明page block原本就可以迁移,可以直接修改page block migrate type
- 当page 对应小于pageblock_order时,whole_block为true是,page block原有属性不是MIGRATE_MOVABLE,则需要根据page block里面迁移的free_pages空闲页数量和可以利用alike_pages, 如果(free_pages+alike_pages) >pageblock_order,允许修改page block migrate type.
- 其他情况不允许做修改page block migrate type。
修改页迁移属性使用set_pageblock_migratetype()函数。
页迁移
页迁移使用move_to_free_list接口,将page从原有的free list中删除同时加入到新的migrate type对应free list:
/* Used for pages which are on another list */
static inline void move_to_free_list(struct page *page, struct zone *zone,
unsigned int order, int migratetype)
{
struct free_area *area = &zone->free_area[order];
list_move(&page->lru, &area->free_list[migratetype]);
}
page_group_by_mobility_disabled
page block migrate type可以通过page_group_by_mobility_disabled 开启和关闭,当系统启动过程对zone 进行初始化,会根据zone内的物理内存实际情况进行判断:
/*
* unless system_state == SYSTEM_BOOTING.
*
* __ref due to call of __init annotated helper build_all_zonelists_init
* [protected by SYSTEM_BOOTING].
*/
void __ref build_all_zonelists(pg_data_t *pgdat)
{
... ...
/*
* Disable grouping by mobility if the number of pages in the
* system is too low to allow the mechanism to work. It would be
* more accurate, but expensive to check per-zone. This check is
* made on memory-hotadd so a system can start with mobility
* disabled and enable it later
*/
if (vm_total_pages < (pageblock_nr_pages * MIGRATE_TYPES))
page_group_by_mobility_disabled = 1;
else
page_group_by_mobility_disabled = 0;
pr_info("Built %u zonelists, mobility grouping %s. Total pages: %ld\n",
nr_online_nodes,
page_group_by_mobility_disabled ? "off" : "on",
vm_total_pages);
#ifdef CONFIG_NUMA
pr_info("Policy zone: %s\n", zone_names[policy_zone]);
#endif
}
- 当内存小于pageblock_nr_pages * MIGRATE_TYPES 物理页时,将把migrate type特性关闭。
move_freepages_block
move_freepages_block()函数,是当指定要page block允许做迁移,需要将page 迁移到对应空闲页中:
int move_freepages_block(struct zone *zone, struct page *page,
int migratetype, int *num_movable)
{
unsigned long start_pfn, end_pfn;
struct page *start_page, *end_page;
if (num_movable)
*num_movable = 0;
start_pfn = page_to_pfn(page);
start_pfn = start_pfn & ~(pageblock_nr_pages-1);
start_page = pfn_to_page(start_pfn);
end_page = start_page + pageblock_nr_pages - 1;
end_pfn = start_pfn + pageblock_nr_pages - 1;
/* Do not cross zone boundaries */
if (!zone_spans_pfn(zone, start_pfn))
start_page = page;
if (!zone_spans_pfn(zone, end_pfn))
return 0;
return move_freepages(zone, start_page, end_page, migratetype,
num_movable);
}
- start_pfn = page_to_pfn(page):获取要迁移page 的pfn。
- start_pfn = start_pfn & ~(pageblock_nr_pages-1):将页帧号pfn 与pagbe block对齐。
- start_page = pfn_to_page(start_pfn);page block对齐之后的起始物理页面。
- end_page = start_page + pageblock_nr_pages - 1:对应page block 结束物理页面
- end_pfn = start_pfn + pageblock_nr_pages - 1:结束pfn:
- 分别对start_pfn和end_pfn做检查
- move_freepages:按照指定范围成批迁移页面。
move_freepages
move_freepages将指定范围的页面,迁移到指定的migrate type free list中:
/*
* Move the free pages in a range to the free lists of the requested type.
* Note that start_page and end_pages are not aligned on a pageblock
* boundary. If alignment is required, use move_freepages_block()
*/
static int move_freepages(struct zone *zone,
struct page *start_page, struct page *end_page,
int migratetype, int *num_movable)
{
struct page *page;
unsigned int order;
int pages_moved = 0;
for (page = start_page; page <= end_page;) {
if (!pfn_valid_within(page_to_pfn(page))) {
page++;
continue;
}
if (!PageBuddy(page)) {
/*
* We assume that pages that could be isolated for
* migration are movable. But we don't actually try
* isolating, as that would be expensive.
*/
if (num_movable &&
(PageLRU(page) || __PageMovable(page)))
(*num_movable)++;
page++;
continue;
}
/* Make sure we are not inadvertently changing nodes */
VM_BUG_ON_PAGE(page_to_nid(page) != zone_to_nid(zone), page);
VM_BUG_ON_PAGE(page_zone(page) != zone, page);
order = page_order(page);
move_to_free_list(page, zone, order, migratetype);
page += 1 << order;
pages_moved += 1 << order;
}
return pages_moved;
}
- 按照指定范围的页面做迁移,从page block的对齐起始页开始
- move_to_free_list:将整个page block迁移到对应migate type中。
- 循环下一page block。
can_steal_fallback
can_steal_fallback 根据order和migrate type判断,steal page时,是否允许将整个page block进行迁移,如果允许,则将整个page block迁移并且修改page block migrate type:
/*
* When we are falling back to another migratetype during allocation, try to
* steal extra free pages from the same pageblocks to satisfy further
* allocations, instead of polluting multiple pageblocks.
*
* If we are stealing a relatively large buddy page, it is likely there will
* be more free pages in the pageblock, so try to steal them all. For
* reclaimable and unmovable allocations, we steal regardless of page size,
* as fragmentation caused by those allocations polluting movable pageblocks
* is worse than movable allocations stealing from unmovable and reclaimable
* pageblocks.
*/
static bool can_steal_fallback(unsigned int order, int start_mt)
{
/*
* Leaving this order check is intended, although there is
* relaxed order check in next check. The reason is that
* we can actually steal whole pageblock if this condition met,
* but, below check doesn't guarantee it and that is just heuristic
* so could be changed anytime.
*/
if (order >= pageblock_order)
return true;
if (order >= pageblock_order / 2 ||
start_mt == MIGRATE_RECLAIMABLE ||
start_mt == MIGRATE_UNMOVABLE ||
page_group_by_mobility_disabled)
return true;
return false;
}
- order >= pageblock_order :当oder大于等 pageblock是,说明至少需要一个 pageblock,允许将整个 pageblock做迁移
- order 》=pageblock_order /2同样也允许做 pageblock做迁移
- MIGRATE_RECLAIMABLE:说明 pageblock可回收,也可以直接做整个 pageblock做迁移
- MIGRATE_UNMOVABLE:可以直接做整个 pageblock做迁移。