On Wed, Jul 18, 2018 at 08:05:51AM +0800, Qu Wenruo wrote:
> No OOM triggers? That's a little strange.
> Maybe it's related to how kernel handles memory over-commit?
Yes, I think you are correct.
> And for the hang, I think it's related to some memory allocation failure
> and error handler just didn't handle it well, so it's causing deadlock
> for certain page.
That indeed matches what I'm seeing.
> ENOMEM handling is pretty common but hardly verified, so it's not that
> strange, but we must locate the problem.
I seem to be getting deadlocks in the kernel, so I'm hoping that at least
it's checked there, but maybe not?
> In my system, at least I'm not using btrfs as root fs, and for the
> memory eating program I normally ensure it's eating all the memory +
> swap, so OOM killer is always triggered, maybe that's the cause.
>
> So in your case, maybe it's btrfs not really taking up all memory, thus
> OOM killer not triggered.
Correct, the swap is not used.
> Any kernel dmesg about OOM killer triggered?
Nothing at all. It never gets triggered.
> > Here is my system when it virtually died:
> > ER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
> > root 31006 21.2 90.7 29639020 29623180 pts/19 D+ 13:49 1:35 ./btrfs check /dev/mapper/dshelf2
See how btrs was taking 29GB in that ps output (that's before it takes
everything and I can't even type ps anymore)
Note that VSZ is almost equal to RSS. Nothing gets swapped.
Then see free output:
> > total used free shared buffers cached
> > Mem: 32643788 32180100 463688 0 44664 119508
> > -/+ buffers/cache: 32015928 627860
> > Swap: 15616764 443676 15173088
>
> For swap, it looks like only some other program's memory is swapped out,
> not btrfs'.
That's exactly correct. btrfs check never goes to swap, I'm not sure why,
and because there is virtual memory free, maybe that's why OOM does not
trigger?
So I guess I can probably "fix" my problem by removing swap, but ultimately
it would be useful to know why memory taken by btrfs check does not end up
in swap.
> And unfortunately, I'm not so familiar with OOM/MM code outside of
> filesystem.
> Any help from other experienced developers would definitely help to
> solve why memory of 'btrfs check' is not swapped out or why OOM killer
> is not triggered.
Do you have someone from linux-vm you might be able to ask, or should we Cc
this thread there?
Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html