On 2019/12/13 1:36 PM, Anand Jain wrote:
metadata_uuid code is too confusing, a lot of if and if-nots
it should be have been better.
Agreed. It costed much brain power of mine.
more below.
Willing to answer from my understanding.
If something wrong, please point at.
On 13/12/19 10:46 AM, Su Yue wrote:
On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
On 12.12.19 г. 13:01 ч., damenly.su@xxxxxxxxx wrote:
<snip>
Acutally, there are two devices in the fs. Device 2 with
FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
fs_devices but failed to be added into since fs_devices->opened (
It's not clear why device 1 wasn't able to be added to the fs_devices
allocated by dev 2. Please elaborate?
Sure, of course.
For example.
$cat test.sh
====================================================================
img1="/tmp/test1.img"
img2="/tmp/test2.img"
[ -f "$img1" ] || fallocate -l 300M "$img1"
[ -f "$img2" ] || fallocate -l 300M "$img2"
mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
losetup -D
dmesg -C
rmmod btrfs
modprobe btrfs
loop1=$(losetup --find --show "$img1")
loop2=$(losetup --find --show "$img2")
Can you explicitly show what devices should be scanned to make the
device mount (below) successful. Fist you can cleanup the
device list using
btrfs device --forget
Thanks for the tip.
The purpose of simple script is to show that there
may be uncompleted/unsuccessful device(s) scanning due to
fs_devices->opened. Is the issue already known?
mount $loop1 /mnt || exit 1
umount /mnt
====================================================================
$dmesg
====================================================================
[ 395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
[ 395.210773] !!!!!!!!fs_device opened
[ 395.213875] BTRFS info (device loop0): disk space caching is enabled
[ 395.214994] BTRFS info (device loop0): has skinny extents
[ 395.215891] BTRFS info (device loop0): flagging fs with big metadata
feature
[ 395.222639] BTRFS error (device loop0): devid 2 uuid
adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
[ 395.224147] BTRFS error (device loop0): failed to read the system
array: -2
[ 395.246163] !!!!!!!!fs_device opened
[ 395.338219] BTRFS error (device loop0): open_ctree failed
=====================================================================
The line "!!!!!!!!fs_device opened" is handy added by me in debug
purpose.
=====================================================================
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -794,6 +794,7 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
if (!device) {
if (fs_devices->opened) {
+ pr_info("!!!!!!!!fs_device opened\n");
mutex_unlock(&fs_devices->device_list_mutex);
return ERR_PTR(-EBUSY);
}
=====================================================================
To make it more clear. The following is in metadata_uuid situation.
Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
(newer transid).
Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
transid).
How were you able to set FSID_CHANGING_V2
and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?
The device2 is simulated to be the device failed to sync due
to some expected reason (power loss).
mkfs on two devices, use v5.4 progs/btrfstune -m $device.
Then both two devices both have the BTRFS_FEATURE_INCOMPAT_METADATA_UUID.
Play some tricks in btrfstune.c to avoid final super block
write on one deivce. Like the ugly code to delete the device:
========================================================================
diff --git a/btrfstune.c b/btrfstune.c
index afa3aae3..f678b978 100644
--- a/btrfstune.c
+++ b/btrfstune.c
@@ -101,12 +101,14 @@ static int set_metadata_uuid(struct btrfs_root
*root, const char *uuid_string)
struct btrfs_super_block *disk_super;
uuid_t new_fsid, unused1, unused2;
struct btrfs_trans_handle *trans;
+ struct btrfs_device *dev, *next;
bool new_uuid = true;
u64 incompat_flags;
bool uuid_changed;
u64 super_flags;
int ret;
disk_super = root->fs_info->super_copy;
super_flags = btrfs_super_flags(disk_super);
incompat_flags = btrfs_super_incompat_flags(disk_super);
@@ -170,6 +172,14 @@ static int set_metadata_uuid(struct btrfs_root
*root, const char *uuid_string)
return 0;
}
+ list_for_each_entry_safe(dev, next,
&root->fs_info->fs_devices->devices,
+ dev_list) {
+ if (dev->devid == 2) {
+ fsync(dev->fd);
+ list_del_init(&dev->dev_list);
+ }
+ }
+==================================================================
Compile again. call btrfstune -m again.
Then we get a device with
BTRFS_FEATURE_INCOMPAT_METADATA_UUID and FSID_CHANGING_V2.
The workflow in misc-tests/034 is
loop1=$(losetup --find --show "$device2")
loop2=$(losetup --find --show "$device1")
mount $loop1 /mnt ---> fails here
Assume the fs_devices was allocated by systemd-udevd through
btrfs_control_ioctl() path after finish of scanning of device2.
Then:
In the two threads which are in race (below), the mount thread can't be
successful unless -o degraded is used, if it does it means the devid 1
Right.. The dmesg reports the device 1 is missing.
is already scanned and for that btrfs_device to be in the
btrfs_fs_devices list the fsid has to match (does not matter
metadata_uuid).
Sorry, I doesn't make much clear what you mean. In similar but no
metadata_uuid situation, mount will fail too but the assertion won't
fail of course. The device1 was scanned but not added into the
fs_devices(already found) since the fs_devices was opened by the
mounting thread.
Thread *mounting device2* Thread *scanning device1*
btrfs_mount_root btrfs_control_ioctl
mutex_lock(&uuid_mutex);
btrfs_read_disk_super
btrfs_scan_one_device
--> there is only device2
in the fs_devices
btrfs_open_devices
fs_devices->opened = 1
fs_devices->latest_bdev = device2
mutex_unlock(&uuid_mutex);
mutex_lock(&uuid_mutex);
btrfs_scan_one_device
btrfs_read_disk_super
device_list_add
found fs_devices
device = btrfs_find_device
rewrite fs_deivces->fsid if
scanned device1 is newer
--> Change fs_devices->fsi
d to device1->fsid
if (!device)
if(fs_devices->opened)
return -EBUSY
--> the device1 adding
aborts since
fs_devices was opened
mutex_unlock(&uuid_mutex);
btrfs_fill_super
open_ctree
btrfs_read_dev_super(
fs_devices->latest_bdev)
--> the latest_bdev is device2
assert fs_devices->fsid equals
device2's fsid.
--> fs_device->fsid was rewritten by
the scanning thread
The result is fs_device->fsid is from device1 but super->fsid is from
the lastest device2.
Oops that's not good. However still not able to image various devices
and its fsid to achieve that condition. Is it possible to write a test
case? It would help.
It did happened in my test environment.. You can try misc-tests/034
about 20 times on v5.4 progs and v5.5-rc1 kernel. As for the test case,
will give a try.
Thanks
Thanks, Anand
the thread is doing mount device 1). But device 1's fsid was copied
to fs_devices->fsid then the assertion failed.
dev 1 fsid should be copied iff its transid is newer.
Even it was failed to be added into the fs_devices?
The solution is that only if a new device was added into a existing
fs_device, then the fs_devices->fsid is allowed to be rewritten.
fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
to the transid.
Then the assertion failed in above scenario. Just do not update the
fs_devices->fsid, let later btrfs_read_sys_array() report the device
missing then reject to mount.
Thanks
Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario
during fsid change")
Signed-off-by: Su Yue <Damenly_Su@xxxxxxx>
---
fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d8e5560db285..9efa4123c335 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -732,6 +732,9 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
+ bool fs_devices_found = false;
+
+ *new_device_added = false;
if (fsid_change_in_progress) {
if (!has_metadata_uuid) {
@@ -772,24 +775,11 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
device = NULL;
} else {
+ fs_devices_found = true;
+
mutex_lock(&fs_devices->device_list_mutex);
device = btrfs_find_device(fs_devices, devid,
disk_super->dev_item.uuid, NULL, false);
-
- /*
- * If this disk has been pulled into an fs devices created by
- * a device which had the CHANGING_FSID_V2 flag then
replace the
- * metadata_uuid/fsid values of the fs_devices.
- */
- if (has_metadata_uuid && fs_devices->fsid_change &&
- found_transid > fs_devices->latest_generation) {
- memcpy(fs_devices->fsid, disk_super->fsid,
- BTRFS_FSID_SIZE);
- memcpy(fs_devices->metadata_uuid,
- disk_super->metadata_uuid, BTRFS_FSID_SIZE);
-
- fs_devices->fsid_change = false;
- }
}
if (!device) {
@@ -912,6 +902,22 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
}
}
+ /*
+ * If the new added disk has been pulled into an fs devices
created by
+ * a device which had the CHANGING_FSID_V2 flag then replace the
+ * metadata_uuid/fsid values of the fs_devices.
+ */
+ if (*new_device_added && fs_devices_found &&
+ has_metadata_uuid && fs_devices->fsid_change &&
+ found_transid > fs_devices->latest_generation) {
+ memcpy(fs_devices->fsid, disk_super->fsid,
+ BTRFS_FSID_SIZE);
+ memcpy(fs_devices->metadata_uuid,
+ disk_super->metadata_uuid, BTRFS_FSID_SIZE);
+
+ fs_devices->fsid_change = false;
+ }
+
/*
* Unmount does not free the btrfs_device struct but would zero
* generation along with most of the other members. So just
update