On Wed, Sep 18, 2019 at 02:56:26PM +0800, Qu Wenruo wrote:
> [BUG]
> When btrfs/011 is executed on a fast enough system (fully memory backed
> VM, with test device has unsafe cache mode), the test can fail like
> this:
>
> btrfs/011 43s ... [failed, exit status 1]- output mismatch (see /home/adam/xfstests-dev/results//btrfs/011.out.bad)
> --- tests/btrfs/011.out 2019-07-22 14:13:44.643333326 +0800
> +++ /home/adam/xfstests-dev/results//btrfs/011.out.bad 2019-09-18 14:49:28.308798022 +0800
> @@ -1,3 +1,4 @@
> QA output created by 011
> *** test btrfs replace
> -*** done
> +failed: '/usr/bin/btrfs replace cancel /mnt/scratch'
> +(see /home/adam/xfstests-dev/results//btrfs/011.full for details)
> ...
>
> [CAUSE]
> Looking into the full output, it shows:
> ...
> Replace from /dev/mapper/test-scratch1 to /dev/mapper/test-scratch2
>
> # /usr/bin/btrfs replace start -f /dev/mapper/test-scratch1 /dev/mapper/test-scratch2 /mnt/scratch
> # /usr/bin/btrfs replace cancel /mnt/scratch
> INFO: ioctl(DEV_REPLACE_CANCEL)"/mnt/scratch": not started
> failed: '/usr/bin/btrfs replace cancel /mnt/scratch'
>
> So this means the replace is already finished before we cancel it.
> For fast system, it's very common.
Does generate heavier load & more data make replace operation last
longer? e.g. make more 'noise' by running fsstress instead of dumping
/dev/urandom before starting replace.
And does sleep shorter time (0.5s?) before cancel work?
Thanks,
Eryu
>
> [FIX]
> Instead of using _run_btrfs_util_prog which requires 0 as return value,
> we just call "$BTRFS_UTIL_PROG replace cancel" and ignore all its
> stderr/stdout, and completely rely on "$BTRFS_UTIL_PROG replace status"
> output to verify the work.
>
> Furthermore if we finished replac before cancelling it, we should
> replace again to switch the device back, or after the test case, btrfs
> check will fail as there is no valid btrfs on that replaced device.
>
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> ---
> tests/btrfs/011 | 16 ++++++++++++++--
> 1 file changed, 14 insertions(+), 2 deletions(-)
>
> diff --git a/tests/btrfs/011 b/tests/btrfs/011
> index 89bb4d11..858b00e8 100755
> --- a/tests/btrfs/011
> +++ b/tests/btrfs/011
> @@ -148,13 +148,25 @@ btrfs_replace_test()
> # background the replace operation (no '-B' option given)
> _run_btrfs_util_prog replace start -f $replace_options $source_dev $target_dev $SCRATCH_MNT
> sleep 1
> - _run_btrfs_util_prog replace cancel $SCRATCH_MNT
> + # 1s is enough for fast system to finish replace, so here we
> + # ignore all the output, completely rely on later status
> + # output to determine
> + $BTRFS_UTIL_PROG replace cancel $SCRATCH_MNT &> /dev/null
>
> # 'replace status' waits for the replace operation to finish
> # before the status is printed
> $BTRFS_UTIL_PROG replace status $SCRATCH_MNT > $tmp.tmp 2>&1
> cat $tmp.tmp >> $seqres.full
> - grep -q canceled $tmp.tmp || _fail "btrfs replace status (canceled) failed"
> + grep -q -e canceled -e finished $tmp.tmp ||\
> + _fail "btrfs replace status (canceled) failed"
> +
> + # If replace finished before cancel, replace them back or
> + # the final fsck after test case will fail as there is no btrfs
> + # on the $source_dev anymore
> + if grep -q -e finished $tmp.tmp ; then
> + $BTRFS_UTIL_PROG replace start -Bf $replace_options \
> + $target_dev $source_dev $SCRATCH_MNT
> + fi
> else
> if [ "${quick}Q" = "thoroughQ" ]; then
> # On current hardware, the thorough test runs
> --
> 2.22.0