Re: [Bug 186671] New: OOM on system with just rsync running 32GB of ram 30GB of pagecache

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It could be totally unrelated but I have a similar problem: processes
get randomly OOM'd when I am doing anything "sort of heavy" on my
Btrfs filesystems.
I did some "evil tuning", so I assumed that must be the problem (even
if the values looked sane for my system). Thus, I kept cutting back on
the manually set values (mostly dirty/background ratio, io scheduler
request queue size and such tunables) but it seems to be a dead end. I
guess anything I change in order to try and cut back on the related
memory footprint just makes the OOMs less frequent but it's only a
matter of time and coincidence (lots of things randomly happen to do
some notable amount of IO) until OOMs happen anyway.
It seems to be plenty enough to start a defrag or balance on more than
a single filesystem (in parallel) and pretty much any notable "useful"
user load will have a high change of triggering OOMs (and get killed)
sooner or later. It's just my limited observation but database-like
loads [like that of bitcoind] (sync writes and/or frequent flushes?)
or high priority buffered writes (ffmpeg running with higher than
default priority and saving live video streams into files without
recoding) seem to have higher chance of triggering this (more so than
simply reading or writing files sequentially and asynchronously,
either locally or through Samba).
I am on gentoo-sources 4.8.8 right now but it was there with 4.7.x as well.

On Thu, Nov 17, 2016 at 10:49 PM, Vlastimil Babka <vbabka@xxxxxxx> wrote:
> On 11/16/2016 02:39 PM, E V wrote:
>> System panic'd overnight running 4.9rc5 & rsync. Attached a photo of
>> the stack trace, and the 38 call traces in a 2 minute window shortly
>> before, to the bugzilla case for those not on it's e-mail list:
>>
>> https://bugzilla.kernel.org/show_bug.cgi?id=186671
>
> The panic screenshot has only the last part, but the end marker says
> it's OOM with no killable processes. The DEBUG_VM config thus didn't
> trigger anything, and still there's tons of pagecache, mostly clean,
> that's not being reclaimed.
>
> Could you now try this?
> - enable CONFIG_PAGE_OWNER
> - boot with kernel option: page_owner=on
> - after the first oom, "cat /sys/kernel/debug/page_owner > file"
> - provide the file (compressed, it will be quite large)
>
> Vlastimil
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux