RE: Unable to restart Mon after reboot

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

> That means you're running out of memory, in kernelspace. The order is the
> power-of-two (2**n) of how many 4kB pages were requested, 0x4020 =
> GFP_COMP|GFP_HIGH (compound & access emergency pools). Btrfs may
> be indirectly related, it's not clear what's consuming all the memory, but
> that doesn't sound all that likely. That message should be followed by a
> stack dump, that might tell us more.
> Are you using the Ceph distributed filesystem, or just the RADOS level, e.g.
> RBD images?

Hi Tommi,

Thanks for your help.  As an update to this thread, the problem proved to be btrfs on Fedora as scrubs showed bad inodes on all three servers.  We were just using RADOS to hold RBD images and there seemed to be plenty of free RAM - I'm unsure what could have consumed all of the memory.

We switched to Ubuntu 12.04 for the tests which stopped all btrfs problems.

We have now spent a week running the iotester corruption tests in KVM instances while live migrating them every 5 minutes (with and without cache), running iozone and trying everything we could to corrupt the VMs.  The tests on 0.47.x all performed flawlessly.

The upgrade to 0.48 went smoothly but since then we have had issues with slow requests showing up in the ceph logs and disk timeouts whenever we run iozone in the VMs.

I will wipe the current OSDs and start with fresh a 0.48 installation to see if I can reproduce the problem.


[CEPH Users]     [Information on CEPH]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Free Online Dating]     [Linux Kernel]     [Linux SCSI]     [XFree86]

Add to Google Powered by Linux