Re: many busy btrfs processes during heavy cpu and memory pressure

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Aug 11, 2019 at 11:43 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>
>
>
> On 2019/8/12 上午10:27, Chris Murphy wrote:
> > I'm not sure this is a bug, but I'm also not sure if the behavior is expected.
> >
> > Test system as follows:
> >
> > Intel i7-2820QM, 4/8 cores
> > 8 GiB RAM, 8 GiB swap on SSD plain partition
> > Samsung SSD 840 EVO 250GB
> > kernel 5.3.0-0.rc3.git0.1.fc31.x86_64+debug, but same behavior seen on 5.2.6
> >
> > Test involves using a desktop, GNOME shell, while building webkitgtk.
> > This uses all available RAM, and eventually all available swap.
> >
> > While the build fails on ext4 as well as on Btrfs, the difference on
> > Btrfs is many btrfs processes taking up quite a lot of cpu resources.
> > And iotop shows many processes with unexpectedly high read IO. I don't
> > have enough data collected to be certain, but it does seem on Btrfs
> > the oom killer is substantially delayed. Realistically, by the time
> > the system is in this state, practically speaking it's lost.
> >
> > Screenshot shows iotop and top state information for this system, at
> > the time sysrq+t is taken.
> >
> > Full 'journalctl -k' output is rather excessive, 13MB uncompressed,
> > 714K zstd compressed
> > https://drive.google.com/open?id=1bYYedsj1O4pii51MUy-7cWhnWGXb67XE
> >
> > from last sysrq+t
> > https://drive.google.com/open?id=1vhnIki9lpiWK8T5Qsl81_RToQ8CFdnfU
> >
> > last screenshot, matching above sysrq+t
> > https://drive.google.com/open?id=12jpQeskPsvHmfvDjWSPOwIWSz09JIUlk
>
> This shows it's btrfs endio workqueue, which do the data verification
> against csum tree.
>
> So you see the point, ext* just doesn't support data csum.

But 10-17% CPU, times 8 processes? Even during scrub at maximum SSD
read there isn't such a load doing csum computations.

Get a load of this screenshot:
https://drive.google.com/file/d/1IDboR1fzP4onu_tzyZxsx7M5cT_RJ7Iz/view

That doesn't even make sense. How is it possible Btrfs is using 100%
CPU times 10 processes? There aren't even that many cores. And then
Firefox is using 800% CPU? Another 8 cores that don't exist. And then
look at iotop which is reporting 28G/s reads? This is an ordinary SATA
SSD that can't do more than maybe 600M/s reads. Something is very
weird and misreporting. But again, only on Btrfs. It doesn't happen
with ext4, even though the system hang user experience is the same and
not worse on Btrfs. Just the system statistics seems much crazier on
Btrfs.

The other time I've seen this behavior? Running Firefox through gdb
with certain kinds of crashes, that have nothing to do with swap.

-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux