Re: kernel BUG at fs/btrfs/volumes.c:3707 still not fixed in 3.7.1 (btrfs-zero-log required) but shown as "RIP btrfs_num_copies"

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 08, 2013 at 08:49:58AM -0800, Marc MERLIN wrote:
> Unfortunately my laptop deadlocks from time to time, and too often
> it triggers this bug in btrfs which is quite hard to recover from.
> 
> The bigger problem is that all the user sees (if anything) is seemingly 
> unrelated info, namely, "RIP: btrfs_num_copies+0x42/0x0b" or somesuch
> http://marc.merlins.org/tmp/btrfs_num_copies.jpg
> 
> It's only if you have serial console, or netconsole, which we can't really
> assume the average users to have, that you can get the correct oops and bug
> info.
> I lost another 3 hours with many reboots and a recovery drive to recover my
> root drive.
> 
> Question #1:
> I have hourly snapshots of my root filesystem, and I wasn't able to mount
> any of them. I got the BUG at fs/btrfs/volumes.c:3707 each time.
> gandalfthegreat:~# mount -o ro,recovery /dev/mapper/root -o 'subvol=root_daily_20130108_00:01:02,defaults,compress=lzo,discard,nossd,space_cache,noatime'
> 
> If my log is damaged, why are all other snapshots also broken?

   Snapshots are not independent of each other. The filesystem as a
whole is damaged -- if you can't mount it, it won't make a difference
which subvolume you try to mount. A snapshot is not a backup; it won't
save you from a broken filesystem or dead hardware. At best, it'll
save you from accidental deletion of files.

> Question #2:
> This btrfs-zero-log business, which in the end fixed my problem, should 
> not be a routine recovery method, especially because the ooops you get on
> your screen doesn't have the proper info that tells you that it's actually
> the right bug as described on
> https://btrfs.wiki.kernel.org/index.php/Problem_FAQ#I_can.27t_mount_my_filesystem.2C_and_I_get_a_kernel_oops.21
> 
> Could mainline kernels be fixed not to oops so badly and in a hard to debug 
> way when this problem which happens too often (at least for me), is hit?

   Oopses in log playback are a bug. The last time we had such a bug
which was identifiable and traceable (back in the 3.1-3.2 era, IIRC),
it got fixed, eventually. So yes, this is a bug, it should be fixed,
and you're not the only person to have seen log tree replys fail in
3.6 and 3.7 kernels.

   Since you seem to be hitting the problem frequently and repeatably,
could you help? Josef has said he'd like a copy of the filesystem
image that btrfs-image produces when run against the broken FS (i.e.
while the FS can't mount) -- that would help track down the corruption
problem, and make the kernel more robust in this area. Just as a
warning, the output may be quite large: it contains all of your FS's
metadata.

> If that helps, here's what I got after the fact when trying to mount the
> broken filesystem before zero'ing logs
[snip]

   That information may also be helpful in conjunction with the
btrfs-image dump of a broken FS. I'm not sure how much help it is on
its own (but thanks for providing it anyway).

   Hugo.

-- 
=== Hugo Mills: hugo@... carfax.org.uk | darksatanic.net | lug.org.uk ===
  PGP key: 515C238D from wwwkeys.eu.pgp.net or http://www.carfax.org.uk
       --- Great oxymorons of the world, no. 4: Future Perfect ---       

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux