Re: [PATCH/RFC 0/11] numa - Automatic-migration

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> This series of patches hooks up linux page migration to the task load
> balancing mechanism.  The effect is such that, when load balancing moves
> a task to a cpu on a different node from where the task last executed,
> the task is notified of this change using a variant of the mechanism used
> to notify a task of pending signals.  When the task returns to user state,
> it attempts to migrate, to the new node, any pages not already on that
> node in those of the task's vm areas under control of default policy.
> 
> By default, the task will use lazy migration to migrate "misplaced"
> pages.  When notified of an inter-node migration, the task will
> walk its address space, attempting to unmap [remove all ptes] any
> anonymous pages in the tasks page table.  When the task subsequently
> touchs any of these unmapped pages, it will include a swap page
> fault.  The swap fault handler will either restore the pte if the
> cached page's location matches it's mempolicy, otherwise the
> "migrate-on-fault" mechanism will attempt to migrate the page to
> the correct node.
> 
> Lazy migration may be disabled by writing zero to the per cpuset
> auto_migrate_lazy file.  In that case, automigration will use
> direct, synchronous migration to pull all anonymous pages mapped
> by the task to new node.
> 
> 	Why lazy migration by default?  Think of the effect
> 	of direct, synchronous migration, in this context,
> 	on large multi-threaded programs.
> 
> Automatic page migration is disabled by default, but can be enabled by
> writing non-zero to the per cpuset auto_migrate_enable file.
> Furthermore, to prevent thrashing, this series provides a second,
> experimental per cpuset control, auto_migrate_interval.  The load
> balancer will not move a task to a different node if it has move to a
> new node in the last auto_migrate_interval seconds.  [User interface
> is in seconds; internally it's in HZ.]  The idea is to give the task
> time to ammortize the cost of the migration by giving it time to
> benefit from local references to the page.  Some experimenting and
> tuning will be necessary to determine the appropriate default value
> for this parameter on various platforms.
> 
> An additional per cpuset control -- migrate_max_mapcount -- adjusts
> the threshold page mapcount at which non-privileged users can migrate
> shared pages.  This control allows experimentation with more aggressive
> auto-migration.
> 
> Why "per cpuset controls"?  Originally, cpusets was the only convenient
> "soft partitioning" or "task grouping" mechanism available.  Now that
> "containers" or "control groups" are available, one might consider
> a "NUMA behavior" control group, orthogonal to cpusets, to control this
> sort behavior.  However, because cpusets are closely tied to NUMA resource
> partitioning and locality management, it still seems like a good place to
> contain the migration and mempolicy behavior controls.
> 
> Finally, the series adds a per process control file -- /proc/<pid>/migrate.
> Writing to this file causes the task to simulate an internode migration
> by walking its address space and unmapping anonymous pages so that they
> will be checked for [mis]placement on next touch; or by directly migrating
> them if lazy migration is disabled for the task's cpuset.  This can be
> used to test the automigration facility or to force a task to reestablish
> it's anonymous page NUMA footprint at any time.

If my remember is correct, you presented this feature anywhere conference and
you have presentation slide, right?
If so, can you please tell me the URL of the presentation. I'd like to 
understand the background of the patch.

Thanks.



--
To unsubscribe from this list: send the line "unsubscribe linux-numa" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Kernel]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]     [Devices]

  Powered by Linux