metadata_uuid code is too confusing, a lot of if and if-nots
it should be have been better.
more below.
On 13/12/19 10:46 AM, Su Yue wrote:
On 2019/12/12 10:15 PM, Nikolay Borisov wrote:
On 12.12.19 г. 13:01 ч., damenly.su@xxxxxxxxx wrote:
<snip>
Acutally, there are two devices in the fs. Device 2 with
FSID_CHANGING_V2 allocated a fs_devices. But, device 1 found the
fs_devices but failed to be added into since fs_devices->opened (
It's not clear why device 1 wasn't able to be added to the fs_devices
allocated by dev 2. Please elaborate?
Sure, of course.
For example.
$cat test.sh
====================================================================
img1="/tmp/test1.img"
img2="/tmp/test2.img"
[ -f "$img1" ] || fallocate -l 300M "$img1"
[ -f "$img2" ] || fallocate -l 300M "$img2"
mkfs.btrfs -f $img1 $img2 2>&1 >/dev/null|| exit 1
losetup -D
dmesg -C
rmmod btrfs
modprobe btrfs
loop1=$(losetup --find --show "$img1")
loop2=$(losetup --find --show "$img2")
Can you explicitly show what devices should be scanned to make the
device mount (below) successful. Fist you can cleanup the
device list using
btrfs device --forget
mount $loop1 /mnt || exit 1
umount /mnt
====================================================================
$dmesg
====================================================================
[ 395.205221] BTRFS: device fsid 5090db22-5e48-4767-8fb7-d037c619c1ee
devid 1 transid 5 /dev/loop0 scanned by systemd-udevd (13620)
[ 395.210773] !!!!!!!!fs_device opened
[ 395.213875] BTRFS info (device loop0): disk space caching is enabled
[ 395.214994] BTRFS info (device loop0): has skinny extents
[ 395.215891] BTRFS info (device loop0): flagging fs with big metadata
feature
[ 395.222639] BTRFS error (device loop0): devid 2 uuid
adcc8454-695f-4e1d-bde8-94041b7bf761 is missing
[ 395.224147] BTRFS error (device loop0): failed to read the system
array: -2
[ 395.246163] !!!!!!!!fs_device opened
[ 395.338219] BTRFS error (device loop0): open_ctree failed
=====================================================================
The line "!!!!!!!!fs_device opened" is handy added by me in debug purpose.
=====================================================================
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -794,6 +794,7 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
if (!device) {
if (fs_devices->opened) {
+ pr_info("!!!!!!!!fs_device opened\n");
mutex_unlock(&fs_devices->device_list_mutex);
return ERR_PTR(-EBUSY);
}
=====================================================================
To make it more clear. The following is in metadata_uuid situation.
Device 1 is without FSID_CHANGING_V2 but has IMCOMPAT_METADATA_UUID.
(newer transid).
Device 2 is with FSID_CHANGING_V2 and IMCOMPAT_METADATA_UUID.(Older
transid).
How were you able to set both BTRFS_SUPER_FLAG_CHANGING_FSID_V2
and BTRFS_FEATURE_INCOMPAT_METADATA_UUID on only devid 2 ?
The workflow in misc-tests/034 is
loop1=$(losetup --find --show "$device2")
loop2=$(losetup --find --show "$device1")
mount $loop1 /mnt ---> fails here
Assume the fs_devices was allocated by systemd-udevd through
btrfs_control_ioctl() path after finish of scanning of device2.
Then:
In the two threads which are in race (below), the mount thread can't be
successful unless -o degraded is used, if it does it means the devid 1
is already scanned and for that btrfs_device to be in the
btrfs_fs_devices list the fsid has to match (does not matter metadata_uuid).
Thread *mounting device2* Thread *scanning device1*
btrfs_mount_root btrfs_control_ioctl
mutex_lock(&uuid_mutex);
btrfs_read_disk_super
btrfs_scan_one_device
--> there is only device2
in the fs_devices
btrfs_open_devices
fs_devices->opened = 1
fs_devices->latest_bdev = device2
mutex_unlock(&uuid_mutex);
mutex_lock(&uuid_mutex);
btrfs_scan_one_device
btrfs_read_disk_super
device_list_add
found fs_devices
device = btrfs_find_device
rewrite fs_deivces->fsid if
scanned device1 is newer
--> Change fs_devices->fsi
d to device1->fsid
if (!device)
if(fs_devices->opened)
return -EBUSY
--> the device1 adding
aborts since
fs_devices was opened
mutex_unlock(&uuid_mutex);
btrfs_fill_super
open_ctree
btrfs_read_dev_super(
fs_devices->latest_bdev)
--> the latest_bdev is device2
assert fs_devices->fsid equals
device2's fsid.
--> fs_device->fsid was rewritten by
the scanning thread
The result is fs_device->fsid is from device1 but super->fsid is from
the lastest device2.
Oops that's not good. However still not able to image various devices
and its fsid to achieve that condition. Is it possible to write a test
case? It would help.
Thanks, Anand
the thread is doing mount device 1). But device 1's fsid was copied
to fs_devices->fsid then the assertion failed.
dev 1 fsid should be copied iff its transid is newer.
Even it was failed to be added into the fs_devices?
The solution is that only if a new device was added into a existing
fs_device, then the fs_devices->fsid is allowed to be rewritten.
fs_devices->fsid must be re-written by any device which is _newer_ w.r.t
to the transid.
Then the assertion failed in above scenario. Just do not update the
fs_devices->fsid, let later btrfs_read_sys_array() report the device
missing then reject to mount.
Thanks
Fixes: 7a62d0f07377 ("btrfs: Handle one more split-brain scenario
during fsid change")
Signed-off-by: Su Yue <Damenly_Su@xxxxxxx>
---
fs/btrfs/volumes.c | 36 +++++++++++++++++++++---------------
1 file changed, 21 insertions(+), 15 deletions(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index d8e5560db285..9efa4123c335 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -732,6 +732,9 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
BTRFS_FEATURE_INCOMPAT_METADATA_UUID);
bool fsid_change_in_progress = (btrfs_super_flags(disk_super) &
BTRFS_SUPER_FLAG_CHANGING_FSID_V2);
+ bool fs_devices_found = false;
+
+ *new_device_added = false;
if (fsid_change_in_progress) {
if (!has_metadata_uuid) {
@@ -772,24 +775,11 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
device = NULL;
} else {
+ fs_devices_found = true;
+
mutex_lock(&fs_devices->device_list_mutex);
device = btrfs_find_device(fs_devices, devid,
disk_super->dev_item.uuid, NULL, false);
-
- /*
- * If this disk has been pulled into an fs devices created by
- * a device which had the CHANGING_FSID_V2 flag then replace
the
- * metadata_uuid/fsid values of the fs_devices.
- */
- if (has_metadata_uuid && fs_devices->fsid_change &&
- found_transid > fs_devices->latest_generation) {
- memcpy(fs_devices->fsid, disk_super->fsid,
- BTRFS_FSID_SIZE);
- memcpy(fs_devices->metadata_uuid,
- disk_super->metadata_uuid, BTRFS_FSID_SIZE);
-
- fs_devices->fsid_change = false;
- }
}
if (!device) {
@@ -912,6 +902,22 @@ static noinline struct btrfs_device
*device_list_add(const char *path,
}
}
+ /*
+ * If the new added disk has been pulled into an fs devices
created by
+ * a device which had the CHANGING_FSID_V2 flag then replace the
+ * metadata_uuid/fsid values of the fs_devices.
+ */
+ if (*new_device_added && fs_devices_found &&
+ has_metadata_uuid && fs_devices->fsid_change &&
+ found_transid > fs_devices->latest_generation) {
+ memcpy(fs_devices->fsid, disk_super->fsid,
+ BTRFS_FSID_SIZE);
+ memcpy(fs_devices->metadata_uuid,
+ disk_super->metadata_uuid, BTRFS_FSID_SIZE);
+
+ fs_devices->fsid_change = false;
+ }
+
/*
* Unmount does not free the btrfs_device struct but would zero
* generation along with most of the other members. So just update