On 2020/2/25 上午1:06, David Sterba wrote:
> On Fri, Feb 21, 2020 at 09:40:32PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2020/2/21 下午9:35, David Sterba wrote:
>>> On Wed, Feb 12, 2020 at 02:12:44PM +0800, Qu Wenruo wrote:
>>>> @@ -4036,7 +4037,39 @@ int btree_write_cache_pages(struct address_space *mapping,
>>>> end_write_bio(&epd, ret);
>>>> return ret;
>>>> }
>>>> - ret = flush_write_bio(&epd);
>>>> + /*
>>>> + * If something went wrong, don't allow any metadata write bio to be
>>>> + * submitted.
>>>> + *
>>>> + * This would prevent use-after-free if we had dirty pages not
>>>> + * cleaned up, which can still happen by fuzzed images.
>>>> + *
>>>> + * - Bad extent tree
>>>> + * Allowing existing tree block to be allocated for other trees.
>>>> + *
>>>> + * - Log tree operations
>>>> + * Exiting tree blocks get allocated to log tree, bumps its
>>>> + * generation, then get cleaned in tree re-balance.
>>>> + * Such tree block will not be written back, since it's clean,
>>>> + * thus no WRITTEN flag set.
>>>> + * And after log writes back, this tree block is not traced by
>>>> + * any dirty extent_io_tree.
>>>> + *
>>>> + * - Offending tree block gets re-dirtied from its original owner
>>>> + * Since it has bumped generation, no WRITTEN flag, it can be
>>>> + * reused without COWing. This tree block will not be traced
>>>> + * by btrfs_transaction::dirty_pages.
>>>> + *
>>>> + * Now such dirty tree block will not be cleaned by any dirty
>>>> + * extent io tree. Thus we don't want to submit such wild eb
>>>> + * if the fs already has error.
>>>> + */
>>>> + if (!test_bit(BTRFS_FS_STATE_ERROR, &fs_info->fs_state)) {
>>>> + ret = flush_write_bio(&epd);
>>>> + } else {
>>>> + ret = -EUCLEAN;
>>>> + end_write_bio(&epd, ret);
>>>> + }
>>>
>>> This replaces one instance of flush_write_bio, would it make sense to
>>> wrap it to flush_write_bio or some other helper? There might be places
>>> where not handling the fs error state would be acceptable, so eg.
>>>
>>> flush_write_bio = as it is now
>>>
>>> flush_write_bio_or_end = does the above
>>>
>>
>> I don't believe there are other call sites needs such special handling,
>> thus a wrapper only used once doesn't make much sense.
>>
>> Unless we're going to introduce more path for btree writeback, current
>> one would be good enough I guess.
>
> I see, thanks. The steps to reproduce are quite complicated already and
> expecting crafted data. There's probably more but would need a similarly
> convoluted way of hitting a missing error code fixup.
The reproducer is complex, mostly because we're catching the problem at
the final stage.
In theory we should catch it as early as possible, although we can't
catch it at tree-checker time, we should still be able to catch it at
tree block allocation time.
For create_tree_block() if we get an eb which has refs > 2, then it may
indicate corrupted extent tree.
IIRC I have submitted similar patch before, but not merged due to some
false alert I guess, maybe it's time to verify that patch.
Thanks,
Qu
>
> We could add more invariant checks that would catch that something is
> done at a wrong time, like here metadata writeback after everything has
> been shut down.
>
Attachment:
signature.asc
Description: OpenPGP digital signature
