[PATCH RFC] btrfs: relocation: Hunt down BUG_ON() in merge_reloc_roots()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[BUG]
There is one BUG_ON() report where a transaction is aborted during
balance, then kernel BUG_ON() in merge_reloc_roots():

  void merge_reloc_roots(struct reloc_control *rc)
  {
	...
	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)); <<<
  }

[CAUSE]
It's still uncertain why we can get to such situation.
As all __add_reloc_root() calls will also link that reloc root to
rc->reloc_roots, and in merge_reloc_roots() we cleanup rc->reloc_roots.

So the root cause is still uncertain.

[FIX]
But we can still hunt down all the BUG_ON() in merge_reloc_roots().

There are 3 BUG_ON()s in it:
- BUG_ON() for read_fs_root() result
- BUG_ON() for root->reloc_root != reloc_root case
- BUG_ON() for the non-empty reloc_root_tree

For the first two, just grab the return value and goto out tag for
cleanup.

For the last one, make it more graceful by:
- grab the lock before doing read/write
- warn instead of panic
- cleanup the nodes in rc->reloc_root_tree

Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
---
Reason for RFC:
The root cause to leak nodes in reloc_root_tree is still unknown.
---
 fs/btrfs/relocation.c | 39 ++++++++++++++++++++++++++++++++++++---
 1 file changed, 36 insertions(+), 3 deletions(-)

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 655f1d5a8c27..d562b5c52a40 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -2484,11 +2484,26 @@ void merge_reloc_roots(struct reloc_control *rc)
 		if (btrfs_root_refs(&reloc_root->root_item) > 0) {
 			root = read_fs_root(fs_info,
 					    reloc_root->root_key.offset);
-			BUG_ON(IS_ERR(root));
-			BUG_ON(root->reloc_root != reloc_root);
+			if (IS_ERR(root)) {
+				ret = PTR_ERR(root);
+				btrfs_err(fs_info,
+					  "failed to read root %llu: %d",
+					  reloc_root->root_key.offset, ret);
+				goto out;
+			}
+			if (root->reloc_root != reloc_root) {
+				btrfs_err(fs_info,
+					"reloc root mismatch for root %llu",
+					root->root_key.objectid);
+				ret = -EINVAL;
+				goto out;
+			}
 
 			ret = merge_reloc_root(rc, root);
 			if (ret) {
+				btrfs_err(fs_info,
+			"failed to merge reloc tree for root %llu: %d",
+					  root->root_key.objectid, ret);
 				if (list_empty(&reloc_root->root_list))
 					list_add_tail(&reloc_root->root_list,
 						      &reloc_roots);
@@ -2520,7 +2535,25 @@ void merge_reloc_roots(struct reloc_control *rc)
 			free_reloc_roots(&reloc_roots);
 	}
 
-	BUG_ON(!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root));
+	spin_lock(&rc->reloc_root_tree.lock);
+	/* Cleanup dirty reloc tree nodes */
+	if (!RB_EMPTY_ROOT(&rc->reloc_root_tree.rb_root)) {
+		struct mapping_node *node;
+		struct mapping_node *next;
+
+		WARN_ON(IS_ENABLED(CONFIG_BTRFS_DEBUG));
+		rbtree_postorder_for_each_entry_safe(node, next,
+				&rc->reloc_root_tree.rb_root, rb_node) {
+			struct btrfs_root *root = node->data;
+
+			if (root)
+				btrfs_debug(fs_info,
+				"root %llu still in rc->reloc_root_tree",
+					    root->root_key.offset);
+			kfree(node);
+		}
+	}
+	spin_unlock(&rc->reloc_root_tree.lock);
 }
 
 static void free_block_list(struct rb_root *blocks)
-- 
2.23.0




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux