On Fri, Dec 12, 2014 at 11:17:58AM +0200, Erkki Seppala wrote: > That may be sort of true, but I think even SMART is helped by the fact > that the media is read through from the beginning to the end*, so it can > detect even the errors that don't bubble through the IO layer. And BTRFS > can indeed note errors that the media doesn't - two checksums is better > than one checksum, assuming they aren't exactly the same algorithm ;). > > Do you alternatively execute SMART self tests? > > * scrub doesn't do this, it reads only through used data I do both. They operate at different layers of the storage stack, and have access to different information. They also have different (and hopefully non-overlapping) bugs. scrub pros: + can compare data with the other copies in RAID1 or DUP mode + can fix bad data when good copies available + slows down when other processes want to use the disk + can be suspended and resumed at will by software + error data is impervious to drive firmware bugs + straightforward error reports + only scans allocated data scrub cons: - only scans allocated data - btrfs filesystems only - CPU and I/O burden - error sources are not localized: scrub errors could be software bugs, bad RAM, bad CPU cooling, bad cabling, bad power supply, or bad hard drive smart pros: + runs in the background + no CPU or I/O required, just read results from previous run and launch new test daily + access to electrical and mechanical data from the drive that are otherwise unavailable to the host + 100% surface scan (including bad sector count) + logs host I/O errors that OS might miss (e.g. because they occur during BIOS booting) + works with any filesystems, partitions, swap, etc. + error sources are localized to the drive in test smart cons: - buggy firmware does not detect or report error events when significant failures occur - buggy firmware does detect and report error events when signficant failures do not occur - buggy firmware will make host accesses painfully slow during scan (WD Green is very bad for this) - firmware does not implement useful subset of SMART command set - SMART command set can be inaccessible through some SATA bridge chips (especially USB) - cannot fix anything, only report quantities of data already lost - cannot reliably detect RAM or CPU failure (on host or drive) - requires the drive to spin for 1-2 continuous hours during test - interpreting the raw data is a black art
Attachment:
signature.asc
Description: Digital signature
