Re: interest in post-mortem examination of a BTRFS system and improving the btrfs-code?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2019-04-07 01:18, Qu Wenruo:


On 2019/4/6 下午10:19, Nik. wrote:


2019-04-06 15:22, Qu Wenruo:


On 2019/4/6 下午9:20, Nik. wrote:


2019-04-06 11:06, Qu Wenruo:

Please try again, and sorry for the inconvenience. Hopes this is the
last try.

#sudo ./btrfs-corrupt-block -X /dev/md0
old offset=131072 len=0
new offset=0 len=0

My bad, the first fix is bad, leading the bad result.

(And that's why we need to review patches)

Fortunately we have everything we need to manually set the value, no
magic any more.

So I gues the next steps were git fetch, make and run again the above
two commands:

#git fetch
  From https://github.com/adam900710/btrfs-progs
   + c7bfe8cc...a8c26abd dirty_fix_for_nik -> origin/dirty_fix_for_nik
(forced update)

It looks like you haven't checked out to the correct branch.

You could use command 'git checkout origin/dirty_fix_for_nik' to change
to the latest branch.

Sorry about this. Once again:

#git checkout origin/dirty_fix_for_nik
HEAD is now at a8c26abd btrfs-progs: corrupt-block: Manually fix bit
flip for Nik.
# make
     [PY]     libbtrfsutil

#./btrfs-corrupt-block -X /dev/md0
old offset=0 len=0
new offset=14966 len=37
Successfully repair tree block at 1894009225216

# mount -t btrfs -o ro /dev/md0 /mnt/md0/
mount: /mnt/md0: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error.

root@bach:~# dmesg|tail
...
[59138.540585] BTRFS info (device md0): disk space caching is enabled
[59138.697727] BTRFS info (device md0): bdev /dev/md0 errs: wr 0, rd 0,
flush 0, corrupt 2181, gen 0
[59139.944682] BTRFS critical (device md0): corrupt leaf: root=1
block=1894009225216 slot=83, bad key order, prev (564984271564800 168
962560) current (2034319192064 168 262144)

Now it's a different problem at different slot.

slot 82 has key (0x201d9a6cf7000, 168, 962560)
slot 83 has key (0x001d9a6df7000, 168, 262144)

You have 2 bits flipped just in one tree block!

Anyway, I have updated the branch, and please try it again.

Thanks,
Qu

#./btrfs-corrupt-block -X /dev/md0
old key = 564984271564800, 168, 962560
new key = 2034318143488, 168, 962560
Successfully repair tree block at 1894009225216

# mount -t btrfs -o ro /dev/md0 /mnt/md0/
mount: /mnt/md0: wrong fs type, bad option, bad superblock on /dev/md0, missing codepage or helper program, or other error.

# dmesg|tail
...
[111221.376675] md: md0: data-check done.
[122291.559537] BTRFS info (device md0): disk space caching is enabled
[122291.704292] BTRFS info (device md0): bdev /dev/md0 errs: wr 0, rd 0, flush 0, corrupt 2181, gen 0 [122293.101782] BTRFS critical (device md0): corrupt leaf: root=1 block=1894009225216 slot=82, bad key order, prev (2034321682432 168 262144) current (2034318143488 168 962560)
[122293.102334] BTRFS error (device md0): failed to read block groups: -5
[122293.156546] BTRFS error (device md0): open_ctree failed

If the data-tree structures alone have so many bits flipped, how much flipped bits are to be expected in the data itself? What should a normal btrfs user do in order to prevent such disasters? And another thing: if I am getting it right, it should have been more reliable/appropriate to let btrfs manage the five disks behind the md0 with a raid1 profile instead binding them in a RAID5 and "giving" just a single device to btrfs.

Kind regards,
Nik.
--



[59139.945109] BTRFS error (device md0): failed to read block groups: -5
[59139.984122] BTRFS error (device md0): open_ctree failed

Kind regards,
Nik.
--

Thanks,
Qu

#make
      [PY]     libbtrfsutil

#./btrfs-corrupt-block -X /dev/md0
old offset=0 len=0
new offset=0 len=0
Successfully repair tree block at 1894009225216

# mount -t btrfs -o ro /dev/md0 /mnt/md0/
mount: /mnt/md0: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error.

# dmesg|tail
...
[56146.672395] BTRFS info (device md0): disk space caching is enabled
[56146.841632] BTRFS info (device md0): bdev /dev/md0 errs: wr 0, rd 0,
flush 0, corrupt 2181, gen 0
[56148.097242] BTRFS critical (device md0): corrupt leaf: root=1
block=1894009225216 slot=30, unexpected item end, have 0 expect 15003
[56148.097583] BTRFS error (device md0): failed to read block groups: -5
[56148.140137] BTRFS error (device md0): open_ctree failed

If the above steps were wrong - please, correct!

The only uncertain part is the size.
If mount still fails, dmesg will tell me the size I need.


Successfully repair tree block at 1894009225216
# mount -t btrfs -o ro /dev/md0 /mnt/md0/
mount: /mnt/md0: wrong fs type, bad option, bad superblock on
/dev/md0,
missing codepage or helper program, or other error.
root@bach:~# dmesg|tail
...
[39342.860715] BTRFS info (device md0): disk space caching is enabled
[39342.933380] BTRFS info (device md0): bdev /dev/md0 errs: wr 0,
rd 0,
flush 0, corrupt 2181, gen 0
[39344.197411] BTRFS critical (device md0): corrupt leaf: root=1
block=1894009225216 slot=30, unexpected item end, have 0 expect 15003
[39344.197915] BTRFS error (device md0): failed to read block
groups: -5
[39344.248137] BTRFS error (device md0): open_ctree failed

Sorry, I forgot to tell: this and previous attempt were with kernel
4.15.0-47-generic.

As long as it can output above message, the kernel version doesn't make
much difference.


My Ubuntu 18.04 LTS is having enormous problems with
Kernel 5.0.2 - very long boot; network, login and other services
cycling
trough "start, timeout, fail, stop" again and again, etc. If kernel
5 is
important I will need time to get it right (maybe even assistance from
another(?) developer group).
Actually with 5.0.2 each boot sends me an email about an empty and not
automatically mounted btrfs filesystem with raid1 profile, consisting
from two devices (sdb and sdi):

kernel: [    9.625619] BTRFS: device fsid
05bd214a-8961-4165-9205-a5089a65b59b devid 2 transid 832 /dev/sdi

Scrubbing it finishes almost immediately (see below), but during next
boot the email comes again:

#btrfs scrub status /mnt/b
scrub status for 05bd214a-8961-4165-9205-a5089a65b59b
           scrub started at Sat Apr  6 10:42:15 2019 and finished after
00:00:00
           total bytes scrubbed: 1.51MiB with 0 errors

Should I be worried about it?

You could try btrfs check --readonly and see what's going on.
If btrfs check --readonly is OK, then it should be mostly OK.

Then it seems to be ok, thank you!


Thanks,
Qu



Kind regards,
Nik.
--

Thanks,
Qu

Thank you.
Nik.
--

Thanks,
Qu


Actually there was one warning during make, I don't know of it is
relevant:
         [CC]     check/main.o
check/main.c: In function ‘try_repair_inode’:
check/main.c:2688:5: warning: ‘ret’ may be used uninitialized in
this
function [-Wmaybe-uninitialized]
       if (!ret) {
          ^
check/main.c:2666:6: note: ‘ret’ was declared here
       int ret;
           ^~~

The previous steps were as follow (output ommited, since nothing
unexpected happened):
#git clone --single-branch -v -b dirty_fix_for_nik
https://github.com/adam900710/btrfs-progs.git
#cd btrfs-progs/
#./autogen.sh
#./configure --disable-documentation --disable-convert
#make

Did I got the right branch? Or miss any step?

Kind regards,
Nik.
--

If everything goes correctly, it should output something like:
        Successfully repaired tree block at 1894009225216
(And please ignore any grammar error in my code)

After that, please run a "btrfs check --readonly" to ensure no
other
bit
flip in your fs.

Thanks,
Qu




Hope this is ok.

Regards,
Nik.
-








[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux