Re: btrfs and ECC RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Samstag, 18. Januar 2014, 07:16:42 schrieb Duncan:
> Ian Hinder posted on Sat, 18 Jan 2014 01:23:41 +0100 as excerpted:
> > I have been reading a lot of articles online about the dangers of using
> > ZFS with non-ECC RAM.  Specifically, the fact that when good data is
> > read from disk and compared with its checksum, a RAM error can cause the
> > read data to be incorrect, causing a checksum failure, and the bad data
> > might now be written back to the disk in an attempt to correct it,
> > corrupting it in the process.  This would be exacerbated by a scrub,
> > which could run through all your data and potentially corrupt it.  There
> > is a strong current of opinion that using ZFS without ECC RAM is
> > "suicide for your data".
> > 
> > I have been unable to find any discussion of the extent to which this is
> > true for btrfs.  Does btrfs handle checksum errors in the same way as
> > ZFS, or does it perform additional checks before writing "corrected"
> > data back to disk?  For example, if it detects a checksum error, it
> > could read the data again to a different memory location to determine
> > if the error existed in the disk copy or the memory.
> 
> Given the license issues around zfs and linux, zfs is a non-starter for
> me here, and as a result I've never looked particularly closely at how it
> works, so I can't really say what it does with checksums or how that
> compares to btrfs.
> 
> I /can/ however say that btrfs does /not/ work the way described above,
> however.
> 
> When reading data from disk, btrfs will check the checksum.  If it shows
> up as bad and btrfs has another copy of the data available (as it will in
> dup, raid1 or raid10 mode, but not in single or raid0 mode, I'm not
> actually sure how the newer and still not fully complete raid5 and raid6
> modes work in that regard), btrfs will read the other copy and see if
> that matches the checksum.  If it does, the good copy is used and the bad
> copy is rewritten.  If no good copy exists, btrfs fails the read.
> 
> So while I don't know how zfs works and whether your scenario of
> rewriting bad data due to checksum failure could happen there or not, it
> can't happen with btrfs, because btrfs will only rewrite the data if it
> has another copy that matches the checksum.  Otherwise it (normally)
> fails the read entirely.

I think Ian refers to the slight chance that BTRFS assumes the checksum on one 
disk to be incorrect due to a memory error *and* on another disk to be correct 
due to another memory error *and* will silently rewrite the incorrect data to 
the correct data.

AFAIK BTRFS still does not correct such errors automatically, but only on a 
scrub. There this *could* happen theoretically.

My gut feeling is, that this is highly, highly unlikely.

At least not more likely than a controller writing out garbage or other such 
hardware issues.

And for hardware issues there are backups.

I´d probably like if all computers had ECC RAM, but then I heard more than 
once that ECC doesn´t even detect all possible memory errors.

Maybe at one point the kernel will be able to checksum memory pages itself?

Actually I only once had a memory error in a machine which went completely 
undetected under Windows XP, but let Debian and Ubuntu installers segfault at 
random places. This was years ago. I have never notices a memory error since 
then and I am not aware of any co-workers having had memory errors on their 
laptops. But then… those are usually enterprise grade laptops, which to my 
knowledge nonetheless just use RAM without ECC. I don´t think that this 
ThinkPad T520 uses ECC RAM.

-- 
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA  B82F 991B EAAC A599 84C7
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux