On 09/01/2020 20:35, Graham Cobb wrote: > On 09/01/2020 17:06, Graham Cobb wrote: >> On 09/01/2020 10:19, Graham Cobb wrote: >>> On 09/01/2020 10:03, Sebastian Döring wrote: >>>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs >>>> scrub resume' to work properly. During a running scrub the resume >>>> information (like data_bytes_scrubbed:1081454592) gets written to a >>>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all >>>> relevant fields are zeroed. 'btrfs scrub resume' then seems to >>>> re-start from the very beginning. >>>> >>>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing >>>> this for a while now. >>>> >>>> Is this intended/expected behavior? Am I using the btrfs-progs wrong? >>>> How can I interrupt and resume a scrub? >>> >>> Coincidentally, I noticed exactly the same thing yesterday! >>> >>> I have just run a quick test. It works with kernel 4.19 but doesn't with >>> kernel 5.3. This is using exactly the same version of btrfs-progs: >>> v5.3.1 (I just rebooted the same system with an old kernel to check). >>> >>> As Sebastian says, the symptom is that the file in /var/lib/btrfs shows >>> all fields as zero after the cancel (although "cancelled" and "finished" >>> are both 1). In particular, last_physical is zero so the scrub always >>> resumes from the beginning. >>> >>> With the old kernel, the file in /var/lib/btrfs correctly has all the >>> values filled in after the cancel so the scrub can be resumed. >> >> I have spent the last couple of hours instrumenting the code of scrub.c >> to try to work out what is going on. > > I was over-complicating it. The problem is simple: > > In kernel 4.19, BTRFS_IOC_SCRUB fills in the (final) progress values in > the scrub args EVEN WHEN THE SCRUB IS CANCELLED! If the errno is 125 > (and presumably most other values) the output arguments are valid. > > In kernel 5.3, THAT IS NO LONGER THE CASE! If the errno is 125, the > progress values are all 0. > > This ABI change breaks btrfs-scrub -- in particular the scrub > cancel-resume handling. This relies on the scrub ioctl reporting the > progress values when the scrub is cancelled: those values are written > out to the file in /var/lib/btrfs and read back in for the resume. > > I haven't attempted to look at the kernel code to see why the behaviour > changed. This regression in btrfs-scrub is a kernel problem: the scrub ioctl ABI seems to have been broken some time between kernel 4.19 and kernel 5.3. Do we need to provide any more information? I am not in a position to do a bisect at this point, but if it is not obvious what change has caused the breakage I can try to do so later in the week.
