[PATCH 0/7] Improve hugepage allocation success rates under load V5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Andrew, the biggest change here is that I've reshuffled the patches to
simplify merging. Please consider picking up patches 2 and 3 and merging
them for 3.6 as they fix a broken commit merged in 3.6-rc1. The rest of
the patches can be merged later.

Changelog since V4
o Rebase to latest linux-next/akpm
o Reshuffle patches for easier merging

Changelog since V3
o Add patch to backoff compaction in the event of lock contention
o Rebase to mmotm, cope with the removal of __GFP_NO_KSWAPD
o Removed RFC

Changelog since V2
o Capture !MIGRATE_MOVABLE pages where possible
o Document the treatment of MIGRATE_MOVABLE pages while capturing
o Expand changelogs

Changelog since V1
o Dropped kswapd related patch, basically a no-op and regresses if fixed (minchan)
o Expanded changelogs a little

Allocation success rates have been far lower since 3.4 due to commit
[fe2c2a10: vmscan: reclaim at order 0 when compaction is enabled]. This
commit was introduced for good reasons and it was known in advance that
the success rates would suffer but it was justified on the grounds that
the high allocation success rates were achieved by aggressive reclaim.
Success rates are expected to suffer even more in 3.6 due to commit
[7db8889a: mm: have order > 0 compaction start off where it left] which
testing has shown to severely reduce allocation success rates under load -
to 0% in one case.

This series aims to improve the allocation success rates without regressing
the benefits of commit fe2c2a10. The series is based on latest mmotm and
takes into account the __GFP_NO_KSWAPD flag is going away.

Patch 1 reverts the __GFP_NO_KSWAPD patch and related fixes. This is so
	patches 2 and 3 can be merged before 3.6 releases. It is reintroduced
	later.

Patch 2 fixes the upstream commit [7db8889a: mm: have order > 0 compaction
	start off where it left] to enable compaction again

Patch 3 identifies when compacion is taking too long due to contention
	and aborts. This fixes a performance problem for Jim Schutt that
	commit 7db8889a was meant to fix.

Patch 4 is a comment fix.

Patch 5 is a rebased version of the __GFP_NO_KSWAPD patch with one change
	in how it handles deferred_compaction.

Patch 6 updates reclaim/compaction to reclaim pages scaled on the number
	of recent failures.

Patch 7 captures suitable high-order pages freed by compaction to reduce
	races with parallel allocation requests.

I tested with a high order allocation stress test. The following kernels
were tested.

revert-v5 	linux-next/mmotm based on 3.6-rc2 with patch 1 applied
contended-v5 	patches 1-3
capture-v5  	patches 1-7

STRESS-HIGHALLOC
                   revert-v5      contended-v5        capture-v5  
Pass 1           0.00 ( 0.00%)    38.00 (38.00%)    45.00 (45.00%)
Pass 2           0.00 ( 0.00%)    46.00 (46.00%)    52.00 (52.00%)
while Rested    85.00 ( 0.00%)    86.00 ( 1.00%)    86.00 ( 1.00%)

From
http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__stress-highalloc-performance-ext3/hydra/comparison.html
I know that the allocation success rates in 3.3.6 was 78% in comparison to
36% in in the current akpm tree. At present the success rate is completely
shot but with patches and 3 applied it goes back up to 38% which is what
I would like to see merged for 3.6. With the full series applied success
rates go up to 45% with some variability in the results.  This is not
as high a success rate as seen in older kernels but it does not reclaim
excessively which is a key point.

MMTests Statistics: vmstat
Page Ins                                     2889316     2904472     3037020
Page Outs                                    8042076     8030516     8026740
Swap Ins                                           0           0           0
Swap Outs                                          0           0           0

Note that swap in/out rates remain at 0. In 3.3.6 with 78% success rates
there were 71881 pages swapped out.

Direct pages scanned                           16822      126135       39297
Kswapd pages scanned                         1112284     1243865     1534553
Kswapd pages reclaimed                       1106913     1203069     1499877
Direct pages reclaimed                         16822      113769       26457
Kswapd efficiency                                99%         96%         97%
Kswapd velocity                              899.586     980.634    1218.131
Direct efficiency                               100%         90%         67%
Direct velocity                               13.605      99.442      31.194

kswapd velocity increased slightly but that is expected as __GFP_NO_KSWAPD is
removed by the full series. The velocity with the full series applied is 1218
pages/sec where as in kernel 3.3.6 with the high allocation success rates
it was 8140 pages/second. Direct velocity is slightly higher but this is
expected as a result of patch 6. Pushing direct reclaim higher would improve
the allocation success rates but with the obvious cost of increased paging
and swap IO.

 include/linux/compaction.h |    4 +-
 include/linux/mm.h         |    1 +
 mm/compaction.c            |  244 +++++++++++++++++++++++++++++++++-----------
 mm/internal.h              |    2 +
 mm/page_alloc.c            |   78 ++++++++++----
 mm/vmscan.c                |   10 ++
 6 files changed, 256 insertions(+), 83 deletions(-)

-- 
1.7.9.2

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[Other Archives]     [Linux Kernel Newbies]     [Linux Driver Development]     [Linux Kbuild]     [Fedora Kernel]     [Linux Kernel Testers]     [Linux SH]     [Linux Omap]     [Linux Tape]     [Linux Input]     [Linux LEDS]     [Linux Kernel Janitors]     [Linux Kernel Packagers]     [Linux Doc]     [Linux Man Pages]     [Linux API]     [Linux Memory Management]     [Linux Modules]     [Linux Standards]     [Kernel Announce]     [Netdev]     [Git]     [Linux PCI]     Linux CAN Development     [Linux I2C]     [Linux RDMA]     [Linux NUMA]     [Netfilter]     [Netfilter Devel]     [SELinux]     [Bugtraq]     [FIO]     [Linux Perf Users]     [Linux Serial]     [Linux PPP]     [Linux ISDN]     [Linux Next]     [Kernel Stable Commits]     [Linux Tip Commits]     [Kernel MM Commits]     [Linux Security Module]     [AutoFS]     [Filesystem Development]     [Ext3 Filesystem]     [Linux bcache]     [Ext4 Filesystem]     [Linux BTRFS]     [Linux CEPH Filesystem]     [Linux XFS]     [XFS]     [Linux NFS]     [Linux CIFS]     [Ecryptfs]     [Linux NILFS]     [Linux Cachefs]     [Reiser FS]     [Initramfs]     [Linux FB Devel]     [Linux OpenGL]     [DRI Devel]     [Fastboot]     [Linux RT Users]     [Linux RT Stable]     [eCos]     [Corosync]     [Linux Clusters]     [LVS Devel]     [Hot Plug]     [Linux Virtualization]     [KVM]     [KVM PPC]     [KVM ia64]     [Linux Containers]     [Linux Hexagon]     [Linux Cgroups]     [Util Linux]     [Wireless]     [Linux Bluetooth]     [Bluez Devel]     [Ethernet Bridging]     [Embedded Linux]     [Barebox]     [Linux MMC]     [Linux IIO]     [Sparse]     [Smatch]     [Linux Arch]     [x86 Platform Driver]     [Linux ACPI]     [Linux IBM ACPI]     [LM Sensors]     [CPU Freq]     [Linux Power Management]     [Linmodems]     [Linux DCCP]     [Linux SCTP]     [ALSA Devel]     [Linux USB]     [Linux PA RISC]     [Linux Samsung SOC]     [MIPS Linux]     [IBM S/390 Linux]     [ARM Linux]     [ARM Kernel]     [ARM MSM]     [Tegra Devel]     [Sparc Linux]     [Linux Security]     [Linux Sound]     [Linux Media]     [Video 4 Linux]     [Linux IRDA Users]     [Linux for the blind]     [Linux RAID]     [Linux ATA RAID]     [Device Mapper]     [Linux SCSI]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Linux IDE]     [Linux SMP]     [Linux AXP]     [Linux Alpha]     [Linux M68K]     [Linux ia64]     [Linux 8086]     [Linux x86_64]     [Linux Config]     [Linux Apps]     [Linux MSDOS]     [Linux X.25]     [Linux Crypto]     [DM Crypt]     [Linux Trace Users]     [Linux Btrace]     [Linux Watchdog]     [Utrace Devel]     [Linux C Programming]     [Linux Assembly]     [Dash]     [DWARVES]     [Hail Devel]     [Linux Kernel Debugger]     [Linux gcc]     [Gcc Help]     [X.Org]     [Wine]

Add to Google Powered by Linux

[Older Kernel Discussion]     [Yosemite National Park Forum]     [Large Format Photos]     [Gimp]     [Yosemite Photos]     [Stuff]