|
|
|
Re: [PATCH 13/40] autonuma: CPU follow memory algorithm | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
|
|
Hi Nai, On Fri, Jun 29, 2012 at 10:11:35PM +0800, Nai Xia wrote: > If one process do very intensive visit of a small set of pages in this > node, but occasional visit of a large set of pages in another node. > Will this algorithm do a very bad judgment? I guess the answer would > be: it's possible and this judgment depends on the racing pattern > between the process and your knuma_scand. Depending if the knuma_scand/scan_pass_sleep_millisecs is more or less occasional than the visit of a large set of pages it may behave differently correct. Note that every algorithm will have a limit on how smart it can be. Just to make a random example: if you lookup some pagecache a million times and some other pagecache a dozen times, their "aging" information in the pagecache will end up identical. Yet we know one set of pages is clearly higher priority than the other. We've only so many levels of lrus and so many referenced/active bitflags per page. Once you get at the top, then all is equal. Does this mean the "active" list working set detection is useless just because we can't differentiate a million of lookups on a few pages, vs a dozen of lookups on lots of pages? Last but not the least, in the very example you mention it's not even clear that the process should be scheduled in the CPU where there is the small set of pages accessed frequently, or the CPU where there's the large set of pages accessed occasionally. If the small sets of pages fits in the 8MBytes of the L2 cache, then it's better to put the process in the other CPU where the large set of pages can't fit in the L2 cache. Lots of hardware details should be evaluated, to really know what's the right thing in such case even if it was you having to decide. But the real reason why the above isn't an issue and why we don't need to solve that problem perfectly: there's not just a CPU follow memory algorithm in AutoNUMA. There's also the memory follow CPU algorithm. AutoNUMA will do its best to change the layout of your example to one that has only one clear solution: the occasional lookup of the large set of pages, will make those eventually go in the node together with the small set of pages (or the other way around), and this is how it's solved. In any case, whatever wrong decision it will take, it will at least be a better decision than the numa/sched where there's absolutely zero information about what pages the process is accessing. And best of all with AutoNUMA you also know which pages the _thread_ is accessing so it will also be able to take optimal decisions if there are more threads than CPUs in a node (as long as not all thread accesses are shared). Hope this explains things better. Andrea -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[Other Archives] [Linux Kernel Newbies] [Linux Driver Development] [Linux Kbuild] [Fedora Kernel] [Linux Kernel Testers] [Linux SH] [Linux Omap] [Linux Tape] [Linux Input] [Linux Kernel Janitors] [Linux Kernel Packagers] [Linux Doc] [Linux Man Pages] [Linux API] [Linux Memory Management] [Linux Modules] [Linux Standards] [Kernel Announce] [Netdev] [Git] [Linux PCI] Linux CAN Development [Linux I2C] [Linux RDMA] [Linux NUMA] [Netfilter] [Netfilter Devel] [SELinux] [Bugtraq] [FIO] [Linux Perf Users] [Linux Serial] [Linux PPP] [Linux ISDN] [Linux Next] [Kernel Stable Commits] [Linux Tip Commits] [Kernel MM Commits] [Linux Security Module] [AutoFS] [Filesystem Development] [Ext3 Filesystem] [Linux bcache] [Ext4 Filesystem] [Linux BTRFS] [Linux CEPH Filesystem] [Linux XFS] [XFS] [Linux NFS] [Linux CIFS] [Ecryptfs] [Linux NILFS] [Linux Cachefs] [Reiser FS] [Initramfs] [Linux FB Devel] [Linux OpenGL] [DRI Devel] [Fastboot] [Linux RT Users] [Linux RT Stable] [eCos] [Corosync] [Linux Clusters] [LVS Devel] [Hot Plug] [Linux Virtualization] [KVM] [KVM PPC] [KVM ia64] [Linux Containers] [Linux Hexagon] [Linux Cgroups] [Util Linux] [Wireless] [Linux Bluetooth] [Bluez Devel] [Ethernet Bridging] [Embedded Linux] [Barebox] [Linux MMC] [Linux IIO] [Sparse] [Smatch] [Linux Arch] [x86 Platform Driver] [Linux ACPI] [Linux IBM ACPI] [LM Sensors] [CPU Freq] [Linux Power Management] [Linmodems] [Linux DCCP] [Linux SCTP] [ALSA Devel] [Linux USB] [Linux PA RISC] [Linux Samsung SOC] [MIPS Linux] [IBM S/390 Linux] [ARM Linux] [ARM Kernel] [ARM MSM] [Tegra Devel] [Sparc Linux] [Linux Security] [Linux Sound] [Linux Media] [Video 4 Linux] [Linux IRDA Users] [Linux for the blind] [Linux RAID] [Linux ATA RAID] [Device Mapper] [Linux SCSI] [SCSI Target Devel] [Linux SCSI Target Infrastructure] [Linux IDE] [Linux SMP] [Linux AXP] [Linux Alpha] [Linux M68K] [Linux ia64] [Linux 8086] [Linux x86_64] [Linux Config] [Linux Apps] [Linux MSDOS] [Linux X.25] [Linux Crypto] [DM Crypt] [Linux Trace Users] [Linux Btrace] [Linux Watchdog] [Utrace Devel] [Linux C Programming] [Linux Assembly] [Dash] [DWARVES] [Hail Devel] [Linux Kernel Debugger] [Linux gcc] [Gcc Help] [X.Org] [Wine]
![]() |
![]() |
[Older Kernel Discussion] [Yosemite National Park Forum] [Large Format Photos] [Gimp] [Yosemite Photos] [Stuff]