Re: btrfs scrub: cancel + resume not resuming - kernel regression

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/01/2020 20:35, Graham Cobb wrote:
> On 09/01/2020 17:06, Graham Cobb wrote:
>> On 09/01/2020 10:19, Graham Cobb wrote:
>>> On 09/01/2020 10:03, Sebastian Döring wrote:
>>>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>>>> scrub resume' to work properly. During a running scrub the resume
>>>> information (like data_bytes_scrubbed:1081454592) gets written to a
>>>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>>>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>>>> re-start from the very beginning.
>>>>
>>>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>>>> this for a while now.
>>>>
>>>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>>>> How can I interrupt and resume a scrub?
>>>
>>> Coincidentally, I noticed exactly the same thing yesterday!
>>>
>>> I have just run a quick test. It works with kernel 4.19 but doesn't with
>>> kernel 5.3. This is using exactly the same version of btrfs-progs:
>>> v5.3.1 (I just rebooted the same system with an old kernel to check).
>>>
>>> As Sebastian says, the symptom is that the file in /var/lib/btrfs shows
>>> all fields as zero after the cancel (although "cancelled" and "finished"
>>> are both 1). In particular, last_physical is zero so the scrub always
>>> resumes from the beginning.
>>>
>>> With the old kernel, the file in /var/lib/btrfs correctly has all the
>>> values filled in after the cancel so the scrub can be resumed.
>>
>> I have spent the last couple of hours instrumenting the code of scrub.c
>> to try to work out what is going on. 
> 
> I was over-complicating it. The problem is simple:
> 
> In kernel 4.19, BTRFS_IOC_SCRUB fills in the (final) progress values in
> the scrub args EVEN WHEN THE SCRUB IS CANCELLED! If the errno is 125
> (and presumably most other values) the output arguments are valid.
> 
> In kernel 5.3, THAT IS NO LONGER THE CASE! If the errno is 125, the
> progress values are all 0.
> 
> This ABI change breaks btrfs-scrub -- in particular the scrub
> cancel-resume handling. This relies on the scrub ioctl reporting the
> progress values when the scrub is cancelled: those values are written
> out to the file in /var/lib/btrfs and read back in for the resume.
> 
> I haven't attempted to look at the kernel code to see why the behaviour
> changed.

This regression in btrfs-scrub is a kernel problem: the scrub ioctl ABI
seems to have been broken some time between kernel 4.19 and kernel 5.3.

Do we need to provide any more information? I am not in a position to do
a bisect at this point, but if it is not obvious what change has caused
the breakage I can try to do so later in the week.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux