On 01/25/2014 04:47 PM, Dan Merillat wrote:
I'm trying to track this down - this started happening without changing the kernel in use, so probably
a corrupted filesystem. The symptoms are that all memory is suddenly used by no apparent source. OOM
killer is invoked on every task, still can't free up enough memory to continue.
When it goes wrong, it's extremely rapid - system goes from stable to dead in less than 30 seconds.
Tested 3.9.0, 3.12.0, 3.12.8. Limited testing on 3.13 shows I think the same problem but I need
to double-check that it's not a different issue. Blows up the exact same way on a real kernel or in
UML.
All sorts of things can trigger it - defrag, random writes to files. Balance and scrub don't,
readonly mount doesn't.
I can reproduce this trivially, mount the filesystem read-write and perform some activity. It only
takes a few minutes. The other btrfs filesystems on the same machine don't show similar problems.
Unfortunately, the output of btrfs-image -c9 is 75gb, much more than I can reasonably share. I've got
a reliable reproducer in UML using UML-COW to always start with the same base image, defrag a file with
33,000 extents and the system explodes within a minute.
Here's the OOM report, the formatting is a bit off due to being delivered via netconsole.
Swap was disabled on this run, but it makes no difference. I get insta-OOM issues out of the blue
with very little memory swapped out.
Don't defrag right now, the snapshot aware defrag is horribly broken and
will OOM the box. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html