On 2019/9/18 下午3:54, Anand Jain wrote:
> On 18/9/19 2:56 PM, Qu Wenruo wrote:
>> [BUG]
>> When btrfs/011 is executed on a fast enough system (fully memory backed
>> VM, with test device has unsafe cache mode), the test can fail like
>> this:
>>
>> btrfs/011 43s ... [failed, exit status 1]- output mismatch (see
>> /home/adam/xfstests-dev/results//btrfs/011.out.bad)
>> --- tests/btrfs/011.out 2019-07-22 14:13:44.643333326 +0800
>> +++ /home/adam/xfstests-dev/results//btrfs/011.out.bad
>> 2019-09-18 14:49:28.308798022 +0800
>> @@ -1,3 +1,4 @@
>> QA output created by 011
>> *** test btrfs replace
>> -*** done
>> +failed: '/usr/bin/btrfs replace cancel /mnt/scratch'
>> +(see /home/adam/xfstests-dev/results//btrfs/011.full for details)
>> ...
>>
>> [CAUSE]
>> Looking into the full output, it shows:
>> ...
>> Replace from /dev/mapper/test-scratch1 to /dev/mapper/test-scratch2
>>
>> # /usr/bin/btrfs replace start -f /dev/mapper/test-scratch1
>> /dev/mapper/test-scratch2 /mnt/scratch
>> # /usr/bin/btrfs replace cancel /mnt/scratch
>> INFO: ioctl(DEV_REPLACE_CANCEL)"/mnt/scratch": not started
>> failed: '/usr/bin/btrfs replace cancel /mnt/scratch'
>>
>> So this means the replace is already finished before we cancel it.
>> For fast system, it's very common.
>>
>> [FIX]
>> Instead of using _run_btrfs_util_prog which requires 0 as return value,
>> we just call "$BTRFS_UTIL_PROG replace cancel" and ignore all its
>> stderr/stdout, and completely rely on "$BTRFS_UTIL_PROG replace status"
>> output to verify the work.
>>
>> Furthermore if we finished replac before cancelling it, we should
>> replace again to switch the device back, or after the test case, btrfs
>> check will fail as there is no valid btrfs on that replaced device.
>>
>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>> ---
>> tests/btrfs/011 | 16 ++++++++++++++--
>> 1 file changed, 14 insertions(+), 2 deletions(-)
>>
>> diff --git a/tests/btrfs/011 b/tests/btrfs/011
>> index 89bb4d11..858b00e8 100755
>> --- a/tests/btrfs/011
>> +++ b/tests/btrfs/011
>> @@ -148,13 +148,25 @@ btrfs_replace_test()
>> # background the replace operation (no '-B' option given)
>> _run_btrfs_util_prog replace start -f $replace_options
>> $source_dev $target_dev $SCRATCH_MNT
>> sleep 1
>> - _run_btrfs_util_prog replace cancel $SCRATCH_MNT
>> + # 1s is enough for fast system to finish replace, so here we
>> + # ignore all the output, completely rely on later status
>> + # output to determine
>> + $BTRFS_UTIL_PROG replace cancel $SCRATCH_MNT &> /dev/null
>> # 'replace status' waits for the replace operation to finish
>> # before the status is printed
>> $BTRFS_UTIL_PROG replace status $SCRATCH_MNT > $tmp.tmp 2>&1
>> cat $tmp.tmp >> $seqres.full
>> - grep -q canceled $tmp.tmp || _fail "btrfs replace status
>> (canceled) failed"
>> + grep -q -e canceled -e finished $tmp.tmp ||\
>> + _fail "btrfs replace status (canceled) failed"
>> +
>> + # If replace finished before cancel, replace them back or
>> + # the final fsck after test case will fail as there is no btrfs
>> + # on the $source_dev anymore
>> + if grep -q -e finished $tmp.tmp ; then
>> + $BTRFS_UTIL_PROG replace start -Bf $replace_options \
>> + $target_dev $source_dev $SCRATCH_MNT
>> + fi
>> else
>> if [ "${quick}Q" = "thoroughQ" ]; then
>> # On current hardware, the thorough test runs
>>
>
> Faced the same problem before. But I don't have a good solution
> to fix. Because the idea of the test case appears to test the replace
> cancel. This change makes it not to error if replace is not running
> in the first place.
Then what about _notrun if we find that replace/scrub finished too early?
BTW, if we really want to test all regular scrub/replace with
cancel/finish combination, we should split this test into 4 test cases,
then we can completely afford to skip 2 while still allow the finish
cases to run...
Thanks,
Qu
>
> Thanks, Anand
>
Attachment:
signature.asc
Description: OpenPGP digital signature
