Re: btrfs-progs reports nonsense scrub status

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, May 13, 2020 at 12:49 AM Andrew Pam <andrew@xxxxxxxxxxxxxx> wrote:
>
> On 11/5/20 2:46 pm, Chris Murphy wrote:
> > I also wonder whether the socket that Graham mentions, could get in
> > some kind of stuck or confused state due to sleep/wake cycle? My case,
> > NVMe, is maybe not the best example because that's just PCIe. In your
> > case it's real drives, so it's SCSI, block, and maybe libata and other
> > things.
>
> Could be.  When I start a new scrub and suspend the system, after a
> resume further attempts to run "btrfs scrub status -dR /home" result in
> the following:
>
> NOTE: Reading progress from status file
> UUID:             85069ce9-be06-4c92-b8c1-8a0f685e43c6
> scrub device /dev/sda (id 1) status
> Scrub started:    Wed May 13 16:10:12 2020
> Status:           running
> Duration:         0:00:22
>         data_extents_scrubbed: 0
>         tree_extents_scrubbed: 29238
>         data_bytes_scrubbed: 0
>         tree_bytes_scrubbed: 479035392
>         read_errors: 0
>         csum_errors: 0
>         verify_errors: 0
>         no_csum: 0
>         csum_discards: 0
>         super_errors: 0
>         malloc_errors: 0
>         uncorrectable_errors: 0
>         unverified_errors: 0
>         corrected_errors: 0
>         last_physical: 0
> scrub device /dev/sdb (id 2) status
> Scrub started:    Wed May 13 16:10:12 2020
> Status:           running
> Duration:         0:00:23
>         data_extents_scrubbed: 0
>         tree_extents_scrubbed: 27936
>         data_bytes_scrubbed: 0
>         tree_bytes_scrubbed: 457703424
>         read_errors: 0
>         csum_errors: 0
>         verify_errors: 0
>         no_csum: 0
>         csum_discards: 0
>         super_errors: 0
>         malloc_errors: 0
>         uncorrectable_errors: 0
>         unverified_errors: 0
>         corrected_errors: 0
>         last_physical: 0
>
> So it appears that the socket connection to the kernel is not able to be
> reestablished after the resume from suspend-to-RAM.  Interestingly, if I
> then manually run "btrfs scrub cancel /home" and "btrfs scrub resume -c3
> /home" then the status reports start working again.  It fails only when
> "btrfs scrub resume -c3 /home" is run from the script
> "/usr/lib/systemd/system-sleep/btrfs-scrub" as follows:
>
> #!/bin/sh
>
> case $1/$2 in
>   pre/*)
>     btrfs scrub cancel /home
>     ;;
>   post/*)
>     sleep 1
>     btrfs scrub resume -c3 /home
>     ;;
> esac
>
> Without the sleep, it does not resume the scrub.  A longer sleep (5
> seconds) does not resolve the issue with the status reports.
>
> Maybe this is some kind of systemd problem...  :(

Oof. So possibly two bugs.

Well, you could post an inquiry about it to systemd-devel@. And then
maybe try to reproduce with something that has systemd-245 to see if
it makes a difference. I ran into something that looked like races in
244 with a rust generator I was testing, that didn't happen in 245.
*shrug*

But I'm not sure testing wise how to isolate the systemd from the
socket questions.


-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux