Re: "Some devices missing" only while not mounted

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 21, 2016 at 12:28 PM, Konstantin Svist <fry.kun@xxxxxxxxx> wrote:

> 1 of the drives failed (/dev/sdb; command timeouts, link reset
> messages), causing a kernel panic by btrfs getting really confused.
> After reboot, I got "parent transid verify failed" while trying to mount.

For each drive:
# smartctl -l scterc /dev/sdX
# cat /sys/block/sdX/device/timeout

The first value must be less than the second. Note that the first
value is in deciseconds, and the second value is in seconds. If scterc
is not supported or disabled, then its equivalent value is only
determined by knowing how the firmware does ECC and the max time it
will try to do recovery on reads, but this can be 120+ seconds.

Chances are there's a misconfiguration in this setup that's allowing
bad sectors to cause the drive to do error recovery, and the SCSI
command timer is being reached before the drive can report a read
error, and this results in the link resets and an accumulation of bad
sectors. It often eventually leads to data loss.


> Booted into USB stick (fedora 23 lxde live), found /dev/sdb2 by SMART
> errors, saw that I can mount degraded (without /dev/sdb2) without any
> errors.
> Replaced the bad drive with a new one, ran "btrfs dev add", "btrfs del
> missing" using btrfs-progs v4.2.2 -- this returned an error saying no
> "missing" device or something.
> Upgraded to btrfs-progs 4.3.1, this time it went fine.
> Reboot to main system got stuck on systemd waiting for btrfs device.



>
> After some back and forth, I found that "ready" returns an error and "fi
> show" is inconsistent.
> /dev/sda2 was showing up as dev id 5 (2 missing)

2 missing with raid10 is not OK, filesystem is probably not repairable
has been my experience

>
> Tried removing /dev/sdb2 again and "btrfs replace"ing the now-missing
> /dev/sdb2 with the fresh instance of /dev/sdb2.
> Now /dev/sdb2 shows up as device 6 (2 and 5 not listed).

Well, the problem is already that you have 2 missing, and trying to do
a replace just makes things worse, near as I can tell. While you might
have found a bug here, you've made it a lot worse by just trying
something difference (dev replace) trying to beat Btrfs over the head
with a hammer rather than trying to solve the mysterious missing
device problem. If Btrfs really thinks there are two missing devices
on raid10, then it's probably a hosed file system at this point.


> "fi show" on mounted /dev/sda2 looks normal; on unmounted /dev/sda2
> shows "Total devices 5" and "Some devices missing"

This is a confusing interpretation because it has nothing to do with
mounted vs unmounted. I'm looking at your attachment, and it only
shows "some devices missing" when you use the -d flag. It doesn't
matter whether the fs is mounted or not, -d always produces "some
devices missing" and without -d it doesn't. And I don't have an
explanation for that.

I suggest you unmount the file system and do 'btrfs check' without
--repair and report the results, lets see if it tells us which devices
it thinks are missing still.




-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux