Re: btrfs scrub: cancel + resume not resuming?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 09/01/2020 10:19, Graham Cobb wrote:
> On 09/01/2020 10:03, Sebastian Döring wrote:
>> Maybe I'm doing it entirely wrong, but I can't seem to get 'btrfs
>> scrub resume' to work properly. During a running scrub the resume
>> information (like data_bytes_scrubbed:1081454592) gets written to a
>> file in /var/lib/btrfs, but as soon as the scrub is cancelled all
>> relevant fields are zeroed. 'btrfs scrub resume' then seems to
>> re-start from the very beginning.
>>
>> This is on linux-5.5-rc5 and btrfs-progs 5.4, but I've been seeing
>> this for a while now.
>>
>> Is this intended/expected behavior? Am I using the btrfs-progs wrong?
>> How can I interrupt and resume a scrub?
> 
> Coincidentally, I noticed exactly the same thing yesterday!
> 
> I have just run a quick test. It works with kernel 4.19 but doesn't with
> kernel 5.3. This is using exactly the same version of btrfs-progs:
> v5.3.1 (I just rebooted the same system with an old kernel to check).
> 
> As Sebastian says, the symptom is that the file in /var/lib/btrfs shows
> all fields as zero after the cancel (although "cancelled" and "finished"
> are both 1). In particular, last_physical is zero so the scrub always
> resumes from the beginning.
> 
> With the old kernel, the file in /var/lib/btrfs correctly has all the
> values filled in after the cancel so the scrub can be resumed.

I have spent the last couple of hours instrumenting the code of scrub.c
to try to work out what is going on. The relationship between the main
thread, the thread where the scrub is running and the thread where the
status updates are being received from the kernel is quite horrible. Not
to mention that two of these three threads write out what could be the
final version of the progress file (and use different data structures as
the source for that write!).

The basic problem is that the scrub program seems to assume it will have
seen the cancellation in the update stream *before* the ioctl completes
with the cancelled status. And that seems to happen the other way round
in the 5.x kernel. Although I haven't done an actual comparison with a
4.19 run to check this.

What I haven't checked, yet, is if the 5.x kernel does actually send the
final data update if we stick around long enough to receive it.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux