On 28/6/19 10:44 AM, Qu Wenruo wrote:
On 2019/6/28 上午10:26, Anand Jain wrote:
At the time mkfs.btrfs the device id and stripe index gets reversed as
shown in [1]. This patch helps to keep them in order at the time of
mkfs.btrfs. And makes it easier to debug.
Before:
Stripe 0 is on devid 2; Stipe 1 is on devid 1;
./mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc && btrfs in dump-tree -d /dev/sdb | grep -A 10000 "chunk tree" | grep -B 10000 "device tree" | grep -A 13 "FIRST_CHUNK_TREE CHUNK_ITEM"
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15975 itemsize 112
length 8388608 owner 2 stripe_len 65536 type SYSTEM|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 0
stripe 0 devid 2 offset 1048576
dev_uuid d9fe51c4-6e79-446d-87ee-5be3184798cd
stripe 1 devid 1 offset 22020096
dev_uuid 16f626ca-1a54-469b-ac7e-25623af884ab
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15863 itemsize 112
length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 0
stripe 0 devid 2 offset 9437184
dev_uuid d9fe51c4-6e79-446d-87ee-5be3184798cd
stripe 1 devid 1 offset 30408704
dev_uuid 16f626ca-1a54-469b-ac7e-25623af884ab
item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15751 itemsize 112
length 314572800 owner 2 stripe_len 65536 type DATA|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 0
stripe 0 devid 2 offset 277872640
dev_uuid d9fe51c4-6e79-446d-87ee-5be3184798cd
stripe 1 devid 1 offset 298844160
dev_uuid 16f626ca-1a54-469b-ac7e-25623af884ab
After:
Stripe 0 is on devid 1; Stripe 1 is on devid 2
./mkfs.btrfs -fq -draid1 -mraid1 /dev/sdb /dev/sdc && btrfs in dump-tree -d /dev/sdb | grep -A 10000 "chunk tree" | grep -B 10000 "device tree" | grep -A 13 "FIRST_CHUNK_TREE CHUNK_ITEM"
/dev/sdb: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
/dev/sdc: 8 bytes were erased at offset 0x00010040 (btrfs): 5f 42 48 52 66 53 5f 4d
item 2 key (FIRST_CHUNK_TREE CHUNK_ITEM 22020096) itemoff 15975 itemsize 112
length 8388608 owner 2 stripe_len 65536 type SYSTEM|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 0
stripe 0 devid 1 offset 22020096
dev_uuid 6abc88fa-f42e-4f0c-9bc3-2225735e51d1
stripe 1 devid 2 offset 1048576
dev_uuid 73746d27-13a6-4d58-ac6b-48c90c31d94d
item 3 key (FIRST_CHUNK_TREE CHUNK_ITEM 30408704) itemoff 15863 itemsize 112
length 268435456 owner 2 stripe_len 65536 type METADATA|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 0
stripe 0 devid 1 offset 30408704
dev_uuid 6abc88fa-f42e-4f0c-9bc3-2225735e51d1
stripe 1 devid 2 offset 9437184
dev_uuid 73746d27-13a6-4d58-ac6b-48c90c31d94d
item 4 key (FIRST_CHUNK_TREE CHUNK_ITEM 298844160) itemoff 15751 itemsize 112
length 314572800 owner 2 stripe_len 65536 type DATA|RAID1
io_align 65536 io_width 65536 sector_size 4096
num_stripes 2 sub_stripes 0
stripe 0 devid 1 offset 298844160
dev_uuid 6abc88fa-f42e-4f0c-9bc3-2225735e51d1
stripe 1 devid 2 offset 277872640
dev_uuid 73746d27-13a6-4d58-ac6b-48c90c31d94d
Signed-off-by: Anand Jain <anand.jain@xxxxxxxxxx>
Reviewed-by: Qu Wenruo <wqu@xxxxxxxx>
But please also check the comment inlined below.
---
volumes.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/volumes.c b/volumes.c
index 79d1d6a07fb7..8c8b17e814b8 100644
--- a/volumes.c
+++ b/volumes.c
@@ -1109,7 +1109,7 @@ again:
return ret;
cur = cur->next;
if (avail >= min_free) {
- list_move_tail(&device->dev_list, &private_devs);
+ list_move(&device->dev_list, &private_devs);
This is OK since current btrfs-progs chunk allocator doesn't follow the
kernel behavior by sorting devices with its unallocated space.
So it's completely devid based.
But please keep in mind that, if we're going to unify the chunk
allocator behavior of kernel and btrfs-progs, the behavior will change.
As the initial temporary chunk is always allocated on devid 1, reducing
its unallocated space thus reducing its priority in chunk allocator, and
making the devid sequence more unreliable.
Right. For the debug here, I have an experimental code which disables
the unallocated space sort in the kernel. I don't have a strong reason
to disable the sort in the kernel so didn't send the patch.
Thanks, Anand
Thanks,
Qu
index++;
if (type & BTRFS_BLOCK_GROUP_DUP)
index++;
@@ -1166,7 +1166,7 @@ again:
/* loop over this device again if we're doing a dup group */
if (!(type & BTRFS_BLOCK_GROUP_DUP) ||
(index == num_stripes - 1))
- list_move_tail(&device->dev_list, dev_list);
+ list_move(&device->dev_list, dev_list);
ret = btrfs_alloc_dev_extent(trans, device, key.offset,
calc_size, &dev_offset);