Re: btrfs-scrub: slow scrub speed (raid5)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Feb 6, 2020 at 10:33 AM Sebastian Döring <moralapostel@xxxxxxxxx> wrote:
>
> Hi everyone,
>
> when I run a scrub on my 5 disk raid5 array (data: raid5, metadata:
> raid6) I notice very slow scrubbing speed: max. 5MB/s per device,
> about 23-24 MB/s in sum (according to btrfs scrub status).

raid56 is not recommended for metadata. With raid5 data, it's
recommended to use raid1 metadata. It's possible to convert from raid6
to raid1 metadata, but you'll need to use -f flag due to the reduced
redundancy.

If you can consistently depend on kernel 5.5+ you can use raid1c3 or
raid1c4 for metadata, although even though the file system itself can
survive a two or three device failure, most of your data won't
survive. It would allow getting some fraction of the files smaller
than 64KiB (raid5 strip size) off the volume.

I'm not sure this accounts for the slow scrub though. It could be some
combination of heavy block group fragmentation, i.e. a lot of free
space in block groups, in both metadata and data block groups, and
then raid6 on top of it. But, I'm not convinced. It's be useful to see
IO and utilization during the scrub from iostat 5, to see if any one
of the drives is ever getting close to 100% utilization.

>
> What's interesting is at the same time the gross read speed across the
> involved devices (according to iostat) is about ~71 MB/s in sum (14-15
> MB/s per device). Where are the remaining 47 MB/s going? I expect
> there would be some overhead because it's a raid5, but it shouldn't be
> much more than a factor of (n-1) / n , no? At the moment it appears to
> be only scrubbing 1/3 of all data that is being read and the rest is
> thrown out (and probably re-read again at a different time).

What do you get for
btrfs fi df /mountpoint/
btrfs fi us /mountpoint/

Is it consistently this slow or does it vary a lot?

>
> Surely this can't be right? Are iostat or possibly btrfs scrub status
> lying to me? What am I seeing here? I've never seen this problem with
> scrubbing a raid1, so maybe there's a bug in how scrub is reading data
> from raid5 data profile?

I'd say more likely it's a lack of optimization for the moderate to
high fragmentation case. Both LVM and mdadm raid have no idea what the
layout is, there's no fs metadata to take into account, so every scrub
read is a full stripe read. However, that means it reads unused
portions of the array too, where Btrfs won't because every read is
deliberate. But that means performance can be impacted by disk
contention.


> It seems to me that I could perform a much faster scrub by rsyncing
> the whole fs into /dev/null... btrfs is comparing the checksums anyway
> when reading data, no?

Yes.

-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux