It seems there's a run-time dependency between scrub and replace
operations for which I don't find hints in the documentation.
Steps to reproduce (choosing large-ish file[system] just to ensure the
operations don't finish immediately - I'm not familiar enough with
rate-limiting setup for a more elegant approach):
0. Log software information
# uname -r -m; btrfs version
5.4.1-gentoo x86_64
btrfs-progs v5.4
1. setup a simple multi-device filesystem with one spare and some data
# for i in {1..3}; do truncate -s 512G loop$i; losetup /dev/loop$i
loop$i; done
# mkfs.btrfs -m raid1 -d raid0 /dev/loop{1..2}
# mkdir /mnt/test && mount /dev/loop1 /mnt/test
# dd if=/dev/urandom of=/mnt/test/somedata bs=1M count=65536
2. replace one device with the spare
# btrfs scrub status /mnt/test/; btrfs replace start /dev/loop2
/dev/loop3 /mnt/test/; sleep 1; btrfs scrub status /mnt/test/; btrfs
replace status -1 /mnt/test; btrfs scrub cancel /mnt/test/; sleep 1;
btrfs replace status /mnt/test
output from step 2:
UUID: eafe3cb7-7ea1-405d-98a9-9dfffee2ea9d
no stats available
Total to scrub: 64.15GiB
Rate: 0.00B/s
Error summary: no errors found
UUID: eafe3cb7-7ea1-405d-98a9-9dfffee2ea9d
no stats available
Time left: 0:00:00
ETA: Fri Dec 6 12:10:03 2019
Total to scrub: 64.15GiB
Bytes scrubbed: 0.00B
Rate: 0.00B/s
Error summary: no errors found
0.1% done, 0 write errs, 0 uncorr. read errs
scrub cancelled
Started on 6.Dec 12:10:02, canceled on 6.Dec 12:10:06 at 0.0%, 0 write
errs, 0 uncorr. read errs
Observations: Prior to starting replace, no scrub is running.
Immediately after issuing replace statement, btrfs scrub status reports
a running scrub operation. After issuing btrfs scrub cancel, the replace
operations is being cancelled instead.
Expectation: As "btrfs scrub cancel" might be issued as part of other
maintenance jobs, it should not affect a replace operation in progress.
Would it be possible to separate the two operations w.r.t. userspace
tools? Alternatively, should this behaviour be documented?
Regards
Bernhard Kühnel