Re: btrfs problems

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 20, 2018 at 3:36 PM Adrian Bastholm <adrian@xxxxxxxxxxxx> wrote:
>
> Thanks a lot for the detailed explanation.
> Aabout "stable hardware/no lying hardware". I'm not running any raid
> hardware, was planning on just software raid.

Yep. I'm referring to the drives, their firmware, cables, logic board,
its firmware, the power supply, power, etc. Btrfs is by nature
intolerant of corruption. Other file systems are more tolerant because
they don't know about it (although recent versions of XFS and ext4 are
now defaulting to checksummed metadata and journals).


>three drives glued
> together with "mkfs.btrfs -d raid5 /dev/sdb /dev/sdc /dev/sdd". Would
> this be a safer bet, or would You recommend running the sausage method
> instead, with "-d single" for safety ? I'm guessing that if one of the
> drives dies the data is completely lost
> Another variant I was considering is running a raid1 mirror on two of
> the drives and maybe a subvolume on the third, for less important
> stuff

RAID does not substantially reduce the chances of data loss. It's not
anything like a backup. It's an uptime enhancer. If you have backups,
and your primary storage dies, of course you can restore from backup
no problem, but it takes time and while the restore is happening,
you're not online - uptime is killed. If that's a negative, might want
to run RAID so you can keep working during the degraded period, and
instead of a restore you're doing a rebuild. But of course there is a
chance of failure during the degraded period. So you have to have a
backup anyway. At least with Btrfs/ZFS, there is another reason to run
with some replication like raid1 or raid5 and that's so that if
there's corruption or a bad sector, Btrfs doesn't just detect it, it
can fix it up with the good copy.

For what it's worth, make sure the drives have lower SCT ERC time than
the SCSI command timer. This is the same for Btrfs as it is for md and
LVM RAID. The command timer default is 30 seconds, and most drives
have SCT ERC disabled with very high recovery times well over 30
seconds. So either set SCT ERC to something like 70 deciseconds. Or
increase the command timer to something like 120 or 180 (either one is
absurdly high but what you want is for the drive to eventually give up
and report a discrete error message which Btrfs can do something
about, rather than do a SATA link reset in which case Btrfs can't do
anything about it).




-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux