On Thu, Feb 6, 2020 at 10:33 AM Sebastian Döring <moralapostel@xxxxxxxxx> wrote: > > Hi everyone, > > when I run a scrub on my 5 disk raid5 array (data: raid5, metadata: > raid6) I notice very slow scrubbing speed: max. 5MB/s per device, > about 23-24 MB/s in sum (according to btrfs scrub status). raid56 is not recommended for metadata. With raid5 data, it's recommended to use raid1 metadata. It's possible to convert from raid6 to raid1 metadata, but you'll need to use -f flag due to the reduced redundancy. If you can consistently depend on kernel 5.5+ you can use raid1c3 or raid1c4 for metadata, although even though the file system itself can survive a two or three device failure, most of your data won't survive. It would allow getting some fraction of the files smaller than 64KiB (raid5 strip size) off the volume. I'm not sure this accounts for the slow scrub though. It could be some combination of heavy block group fragmentation, i.e. a lot of free space in block groups, in both metadata and data block groups, and then raid6 on top of it. But, I'm not convinced. It's be useful to see IO and utilization during the scrub from iostat 5, to see if any one of the drives is ever getting close to 100% utilization. > > What's interesting is at the same time the gross read speed across the > involved devices (according to iostat) is about ~71 MB/s in sum (14-15 > MB/s per device). Where are the remaining 47 MB/s going? I expect > there would be some overhead because it's a raid5, but it shouldn't be > much more than a factor of (n-1) / n , no? At the moment it appears to > be only scrubbing 1/3 of all data that is being read and the rest is > thrown out (and probably re-read again at a different time). What do you get for btrfs fi df /mountpoint/ btrfs fi us /mountpoint/ Is it consistently this slow or does it vary a lot? > > Surely this can't be right? Are iostat or possibly btrfs scrub status > lying to me? What am I seeing here? I've never seen this problem with > scrubbing a raid1, so maybe there's a bug in how scrub is reading data > from raid5 data profile? I'd say more likely it's a lack of optimization for the moderate to high fragmentation case. Both LVM and mdadm raid have no idea what the layout is, there's no fs metadata to take into account, so every scrub read is a full stripe read. However, that means it reads unused portions of the array too, where Btrfs won't because every read is deliberate. But that means performance can be impacted by disk contention. > It seems to me that I could perform a much faster scrub by rsyncing > the whole fs into /dev/null... btrfs is comparing the checksums anyway > when reading data, no? Yes. -- Chris Murphy
