Marc,
Could you please use the kernel patch (sent to the list or at
git@xxxxxxxxxx:asj/btrfs-boilerplate.git boilerplate-v5.6) it can dump
the btrfs kernel device_list into the user space using procfs. (This
patch is only for debugging).
I tried test (as below) if there will be any availability issue
(that is requiring to reboot) steps used are as below, and I am
unable to reproduce. When it happens again at your end, these insight
into the kernel might shed some more light on the issue.
--------------------------------
$ fillfs /btrfs 10000
$ devmgt detach /dev/sda
[65985.636630] BTRFS: error (device sda) in
btrfs_commit_transaction:2345: errno=-5 IO failure (Error while writing
out transaction)
[65985.636631] BTRFS info (device sda): forced readonly
[65985.636633] BTRFS warning (device sda): Skipping commit of aborted
transaction.
[65985.636634] BTRFS: error (device sda) in cleanup_transaction:1894:
errno=-5 IO failure
[65985.636636] BTRFS info (device sda): delayed_refs has NO entry
$ devmgt attach host0
[66501.910237] BTRFS warning (device sda): duplicate device fsid:devid
for 8cc98c45-1a11-4a30-bca8-9760c246ccb4:1 old:/dev/sda new:/dev/sdb
$ btrfs fi show -m
Label: none uuid: 8cc98c45-1a11-4a30-bca8-9760c246ccb4
Total devices 1 FS bytes used 16.06MiB
*** Some devices missing
above -m option reads the device path from the kernel which does provide
as /dev/sda but as we check its access in the user-space and as its not
accessible so we report missing.
$ cat /proc/fs/btrfs/devlist
::
device: /dev/sda
::
generation: 10
::
dev_state: |WRITEABLE|IN_FS_METADATA|dev_stats_valid
bdev: not_null
$ mount /dev/sdb /btrfs1
mount: /btrfs1: mount(2) system call failed: File exists.
The above mount fails because we find the same fs signature on both
/dev/sda (stale) and /dev/sdb and further the generation number on both
of these devices are same.
$ btrfs in dump-super /dev/sdb | grep ^generation
generation 10
$ btrfs dev scan --forget
$ cat /proc/fs/btrfs/devlist
::
device: /dev/sda
--forget option can't clean the device because its still mounted.
$ umount /btrfs
$ cat /proc/fs/btrfs/devlist | egrep 'device:|bdev'
device: /dev/sda
bdev: null
unmount is successful and bdev is null. Now --forget should work.
$ btrfs dev scan --forget
$ cat /proc/fs/btrfs/devlist | egrep 'device:|bdev'
$
Now as there isn't any stale device in the kernel and mount will be
successful.
$ mount /dev/sdb /btrfs
$ cat /proc/fs/btrfs/devlist | egrep 'device:|bdev'
device: /dev/sdb
bdev: not_null
So reboot was required.
---------------------
Thanks, Anand
On 4/20/20 10:56 PM, Marc MERLIN wrote:
On Mon, Apr 20, 2020 at 07:10:24PM +0800, Anand Jain wrote:
The steps below are they in the chronological order?
That is my recollection, yes.
Before and after --forget command
btrfs fi show -m
could have told us what devices are still mounted.
Oh, I didn't know about this. If/when it happens next, I'll
run this to show btrfs' understanding of what's mounted instead of
the kernel's understanding (/proc/self/mounts)
I will send a boilerplate code to dump device list from the kernel it will
help to debug. As of now this boilderplate code which I have been using is
too localized needs a lot of cleanups, will take sometime.
Sounds good.
Thanks,
Marc