Re: btrfs and ECC RAM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ian Hinder posted on Sat, 18 Jan 2014 01:23:41 +0100 as excerpted:

> I have been reading a lot of articles online about the dangers of using
> ZFS with non-ECC RAM.  Specifically, the fact that when good data is
> read from disk and compared with its checksum, a RAM error can cause the
> read data to be incorrect, causing a checksum failure, and the bad data
> might now be written back to the disk in an attempt to correct it,
> corrupting it in the process.  This would be exacerbated by a scrub,
> which could run through all your data and potentially corrupt it.  There
> is a strong current of opinion that using ZFS without ECC RAM is
> "suicide for your data".
> 
> I have been unable to find any discussion of the extent to which this is
> true for btrfs.  Does btrfs handle checksum errors in the same way as
> ZFS, or does it perform additional checks before writing "corrected"
> data back to disk?  For example, if it detects a checksum error, it
> could read the data again to a different memory location to determine
> if the error existed in the disk copy or the memory.

Given the license issues around zfs and linux, zfs is a non-starter for 
me here, and as a result I've never looked particularly closely at how it 
works, so I can't really say what it does with checksums or how that 
compares to btrfs.

I /can/ however say that btrfs does /not/ work the way described above, 
however.

When reading data from disk, btrfs will check the checksum.  If it shows 
up as bad and btrfs has another copy of the data available (as it will in 
dup, raid1 or raid10 mode, but not in single or raid0 mode, I'm not 
actually sure how the newer and still not fully complete raid5 and raid6 
modes work in that regard), btrfs will read the other copy and see if 
that matches the checksum.  If it does, the good copy is used and the bad 
copy is rewritten.  If no good copy exists, btrfs fails the read.

So while I don't know how zfs works and whether your scenario of 
rewriting bad data due to checksum failure could happen there or not, it 
can't happen with btrfs, because btrfs will only rewrite the data if it 
has another copy that matches the checksum.  Otherwise it (normally) 
fails the read entirely.  

It is possible to turn off btrfs checksumming entirely with a mount 
option, or to turn off both COW and checksumming on an individual file 
using xattributes, but that's definitely not recommended in general (tho 
it is on specific types of files, generally large internal-write files 
that otherwise end up hugely fragmented due to COW).

As George Mitchell mentions in his followup, there's another thread 
discussing ECC memory and btrfs already.  However, the OP in that thread 
didn't explain the alleged problem with zfs (which again, I've no idea 
whether it's true or not, since due to the licensing issues zfs is a flat 
non-starter for me so I've never looked into it that closely) in that 
regard, so all we were able to say was that ECC and btrfs aren't related 
in that way.  At least here you explained a bit about the alleged 
problem, so we can say for sure that btrfs doesn't work that way.

-- 
Duncan - List replies preferred.   No HTML msgs.
"Every nonfree program has a lord, a master --
and if you use the program, he is your master."  Richard Stallman

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux