Re: [PATCH] btrfs: scrub: Require mandatory block group RO for dev-replace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/1/23 下午8:06, Filipe Manana wrote:
> On Thu, Jan 23, 2020 at 7:38 AM Qu Wenruo <wqu@xxxxxxxx> wrote:
>>
>> [BUG]
>> For dev-replace test cases with fsstress, like btrfs/06[45] btrfs/071,
>> looped runs can lead to random failure, where scrub finds csum error.
>>
>> The possibility is not high, around 1/20 to 1/100, but it's causing data
>> corruption.
>>
>> The bug is observable after commit b12de52896c0 ("btrfs: scrub: Don't
>> check free space before marking a block group RO")
>>
>> [CAUSE]
>> Dev-replace has two source of writes:
>> - Write duplication
>>   All writes to source device will also be duplicated to target device.
>>
>>   Content:      Latest data/meta
> 
> I find the term "latest" a bit confusing, perhaps "not yet persisted
> data and metadata" is more clear.
> 
>>
>> - Scrub copy
>>   Dev-replace reused scrub code to iterate through existing extents, and
>>   copy the verified data to target device.
>>
>>   Content:      Data/meta in commit root
> 
> And so here "previously persisted data and metadata".
> 
>>
>> The difference in contents makes the following race possible:
>>         Regular Writer          |       Dev-replace
>> -----------------------------------------------------------------
>>   ^                             |
>>   | Preallocate one data extent |
>>   | at bytenr X, len 1M         |
>>   v                             |
>>   ^ Commit transaction          |
>>   | Now extent [X, X+1M) is in  |
>>   v commit root                 |
>>  ================== Dev replace starts =========================
>>                                 | ^
>>                                 | | Scrub extent [X, X+1M)
>>                                 | | Read [X, X+1M)
>>                                 | | (The content are mostly garbage
>>                                 | |  since it's preallocated)
>>   ^                             | v
>>   | Write back happens for      |
>>   | extent [X, X+512K)          |
>>   | New data writes to both     |
>>   | source and target dev.      |
>>   v                             |
>>                                 | ^
>>                                 | | Scrub writes back extent [X, X+1M)
>>                                 | | to target device.
>>                                 | | This will over write the new data in
>>                                 | | [X, X+512K)
>>                                 | v
>>
>> This race can only happen for nocow writes. Thus metadata and data cow
>> writes are safe, as COW will never overwrite extents of previous trans
>> (in commit root).
>>
>> This behavior can be confirmed by disabling all fallocate related calls
>> in fsstress (*), then all related tests can pass a 2000 run loop.
>>
>> *: FSSTRESS_AVOID="-f fallocate=0 -f allocsp=0 -f zero=0 -f insert=0 \
>>                    -f collapse=0 -f punch=0 -f resvsp=0"
>>    I didn't expect resvsp ioctl will fallback to fallocate in VFS...
>>
>> [FIX]
>> Make dev-replace to require mandatory block group RO, and wait for current
>> nocow writes before calling scrub_chunk().
>>
>> This patch will mostly revert commit 76a8efa171bf ("btrfs: Continue replace
>> when set_block_ro failed") for dev-replace path.
>>
>> ENOSPC for dev-replace is still much better than data corruption.
> 
> Technically if we flag the block group RO without being able to
> persist that due to ENOSPC, it's still ok.

Right, I will change the words, since it only slightly increase the
possibility of ENOSPC, not ensured to cause ENOSPC and abort replace.

> We just want to prevent nocow writes racing with scrub copying
> procedure. But that's something for some other time, and this is fine
> to me.
> 
>>
>> Reported-by: Filipe Manana <fdmanana@xxxxxxxx>
>> Fixes: 76a8efa171bf ("btrfs: Continue replace when set_block_ro failed")
>> Fixes: b12de52896c0 ("btrfs: scrub: Don't check free space before marking a block group RO")
>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>> ---
>> Changelog:
>> RFC->v1:
>> - Remove the RFC tag
>>   Since the cause is pinned and verified, no need for RFC.
>>
>> - Only wait for nocow writes for dev-replace
>>   CoW writes are safe as they will never overwrite extents in commit
>>   root.
>>
[...]
>> +               /*
>> +                * Now the target block is marked RO, wait for nocow writes to
>> +                * finish before dev-replace.
>> +                * COW is fine, as COW never overwrites extents in commit tree.
>> +                */
>> +               if (sctx->is_dev_replace)
>> +                       btrfs_wait_nocow_writers(cache);
> 
> So this only waits for nocow ordered extents to be created - it
> doesn't wait for them to complete.
> After that you still need to call:
> 
> btrfs_wait_ordered_roots(fs_info, U64_MAX, cache->start, cache->length);

Forgot that, although 1000 runs doesn't expose any problem you are
completely right.

I'll update the patch to address all mentioned comments.

Thanks,
Qu

> 
> Other than that, looks good to me.
> 
> Thanks.
> 
>> +
>> +               scrub_pause_off(fs_info);
>>                 down_write(&dev_replace->rwsem);
>>                 dev_replace->cursor_right = found_key.offset + length;
>>                 dev_replace->cursor_left = found_key.offset;
>> --
>> 2.25.0
>>
> 
> 




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux