On Sun, May 10, 2020 at 7:39 PM Andrew Pam <andrew@xxxxxxxxxxxxxx> wrote: > > On 10/5/20 6:33 am, Chris Murphy wrote: > > 2. That a scrub started, then cancelled, then resumed, also does > > finish (or not). > > OK, I have now run a scrub with multiple cancel and resumes and that > also proceeded and finished normally as expected: > > $ sudo ./btrfs scrub status -d /home > NOTE: Reading progress from status file > UUID: 85069ce9-be06-4c92-b8c1-8a0f685e43c6 > scrub device /dev/sda (id 1) history > Scrub resumed: Mon May 11 06:06:37 2020 > Status: finished > Duration: 7:27:31 > Total to scrub: 3.67TiB > Rate: 142.96MiB/s > Error summary: no errors found > scrub device /dev/sdb (id 2) history > Scrub resumed: Mon May 11 06:06:37 2020 > Status: finished > Duration: 7:27:15 > Total to scrub: 3.67TiB > Rate: 143.04MiB/s > Error summary: no errors found > > [54472.936094] BTRFS info (device sda): scrub: started on devid 2 > [54472.936095] BTRFS info (device sda): scrub: started on devid 1 > [55224.956293] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [55226.356563] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [58775.602370] BTRFS info (device sda): scrub: started on devid 1 > [58775.602372] BTRFS info (device sda): scrub: started on devid 2 > [72393.296199] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [72393.296215] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [77731.999603] BTRFS info (device sda): scrub: started on devid 1 > [77731.999604] BTRFS info (device sda): scrub: started on devid 2 > [87727.510382] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [87727.582401] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [89358.196384] BTRFS info (device sda): scrub: started on devid 1 > [89358.196386] BTRFS info (device sda): scrub: started on devid 2 > [89830.639654] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [89830.856232] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [94486.300097] BTRFS info (device sda): scrub: started on devid 2 > [94486.300098] BTRFS info (device sda): scrub: started on devid 1 > [96223.185459] BTRFS info (device sda): scrub: not finished on devid 1 > with status: -125 > [96223.227246] BTRFS info (device sda): scrub: not finished on devid 2 > with status: -125 > [97810.489388] BTRFS info (device sda): scrub: started on devid 1 > [97810.540625] BTRFS info (device sda): scrub: started on devid 2 > [98068.987932] BTRFS info (device sda): scrub: finished on devid 2 with > status: 0 > [98085.771626] BTRFS info (device sda): scrub: finished on devid 1 with > status: 0 > > So by elimination it's starting to look like suspend-to-RAM might be > part of the problem. That's what I'll test next. > Power management is difficult. (I'm actually working on a git bisect right now, older laptop won't wake from suspend, 5.7 regression.) Do all the devices wake up correctly isn't always easy to get an answer to. They might all have power but did they really come up in the correct state? Thing is, you're reporting that iotop independently shows a transfer rate consistent with getting data off the drives. I also wonder whether the socket that Graham mentions, could get in some kind of stuck or confused state due to sleep/wake cycle? My case, NVMe, is maybe not the best example because that's just PCIe. In your case it's real drives, so it's SCSI, block, and maybe libata and other things. -- Chris Murphy
