On 2019/10/1 上午1:51, David Sterba wrote:
> On Thu, Sep 26, 2019 at 02:35:45PM +0800, Qu Wenruo wrote:
>> [BUG]
>> There is one BUG_ON() report where a transaction is aborted during
>> balance, then kernel BUG_ON() in merge_reloc_roots():
>
> Do you have details from the report, eg. what's the error code?
Nope, what I get is only kernel message.
Not enough to get the error code.
>
>> void merge_reloc_roots(struct reloc_control *rc)
>> {
>> ...
>> BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
>> }
>>
>> [CAUSE]
>> It's still uncertain why we can get to such situation.
>> As all __add_reloc_root() calls will also link that reloc root to
>> rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.
>>
>> So the root cause is still uncertain.
>>
>> [FIX]
>> But we can still hunt down all the BUG_ON() in merge_reloc_roots().
>>
>> There are 3 BUG_ON()s in it:
>> - BUG_ON() for read_fs_root() result
>> - BUG_ON() for root->reloc_root != reloc_root case
>> - BUG_ON() for the non-empty reloc_root_tree
>
> relocation.c is worst regarding number of BUG_ONs, some of them look
> like runtime assertions of the invariants but some of them are
> substituting for error handling.
Yeah, if we inject kmalloc error into btrfs_relocate_block_group(), it's
too easy to hit BUG_ON().
>
> The first one BUG_ON(IS_ERR(root)); is clearly the latter, the other are
> assertions
>
>>
>> For the first two, just grab the return value and goto out tag for
>> cleanup.
>>
>> For the last one, make it more graceful by:
>> - grab the lock before doing read/write
>> - warn instead of panic
>> - cleanup the nodes in rc->reloc_root_tree
>>
>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>> ---
>> Reason for RFC:
>> The root cause to leak nodes in reloc_root_tree is still unknown.
>> ---
>> fs/btrfs/relocation.c | 39 ++++++++++++++++++++++++++++++++++++---
>> 1 file changed, 36 insertions(+), 3 deletions(-)
>>
>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>> index 655f1d5a8c27..d562b5c52a40 100644
>> --- a/fs/btrfs/relocation.c
>> +++ b/fs/btrfs/relocation.c
>> @@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc)
>> if (btrfs_root_refs(&reloc_root->root_item) > 0) {
>> root = read_fs_root(fs_info,
>> reloc_root->root_key.offset);
>> - BUG_ON(IS_ERR(root));
>> - BUG_ON(root->reloc_root != reloc_root);
>> + if (IS_ERR(root)) {
>
> This is bug_on -> error handling, ok
>
>> + ret = PTR_ERR(root);
>> + btrfs_err(fs_info,
>> + "failed to read root %llu: %d",
>> + reloc_root->root_key.offset, ret);
>> + goto out;
>> + }
>> + if (root->reloc_root != reloc_root) {
>
> With this one I'm not sure it could happen but replacing the bug on is
> still good.
>
>> + btrfs_err(fs_info,
>> + "reloc root mismatch for root %llu",
>
> Would be good to print the number of the other root as well.
>
>> + root->root_key.objectid);
>> + ret = -EINVAL;
>> + goto out;
>> + }
>>
>> ret = merge_reloc_root(rc, root);
>> if (ret) {
>> + btrfs_err(fs_info,
>> + "failed to merge reloc tree for root %llu: %d",
>> + root->root_key.objectid, ret);
>> if (list_empty(&reloc_root->root_list))
>> list_add_tail(&reloc_root->root_list,
>> &reloc_roots);
>> @@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc)
>> free_reloc_roots(&reloc_roots);
>> }
>>
>> - BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
>
> This one looks more like the invariant, the tree should be really empty
> here. While the cleanup is trying to make things work despite there's a
> problem, I think the warning should not be debugging only.
Indeed, we should output the warning no matter whatever.
Thanks,
Qu
>
>> + spin_lock(&rc->reloc_root_tree.lock);
>> + /* Cleanup dirty reloc tree nodes */
>> + if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) {
>> + struct mapping_node *node;
>> + struct mapping_node *next;
>> +
>> + WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
>
> ...
>
Attachment:
signature.asc
Description: OpenPGP digital signature
