Re: Balance & scrub & defrag

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Dec 12, 2014 at 11:17:58AM +0200, Erkki Seppala wrote:
> That may be sort of true, but I think even SMART is helped by the fact
> that the media is read through from the beginning to the end*, so it can
> detect even the errors that don't bubble through the IO layer. And BTRFS
> can indeed note errors that the media doesn't - two checksums is better
> than one checksum, assuming they aren't exactly the same algorithm ;).
> 
> Do you alternatively execute SMART self tests?
> 
> * scrub doesn't do this, it reads only through used data

I do both.  They operate at different layers of the storage stack, and have
access to different information.  They also have different (and hopefully
non-overlapping) bugs.

scrub pros:

	+ can compare data with the other copies in RAID1 or DUP mode

	+ can fix bad data when good copies available

	+ slows down when other processes want to use the disk

	+ can be suspended and resumed at will by software

	+ error data is impervious to drive firmware bugs

	+ straightforward error reports

	+ only scans allocated data

scrub cons:

	- only scans allocated data

	- btrfs filesystems only

	- CPU and I/O burden

	- error sources are not localized:  scrub errors could be software
	bugs, bad RAM, bad CPU cooling, bad cabling, bad power supply,
	or bad hard drive

smart pros:

	+ runs in the background

	+ no CPU or I/O required, just read results from previous run
	and launch new test daily

	+ access to electrical and mechanical data from the drive
	that are otherwise unavailable to the host

	+ 100% surface scan (including bad sector count)

	+ logs host I/O errors that OS might miss
	(e.g. because they occur during BIOS booting)

	+ works with any filesystems, partitions, swap, etc.

	+ error sources are localized to the drive in test

smart cons:

	- buggy firmware does not detect or report error events when
	significant failures occur

	- buggy firmware does detect and report error events when
	signficant failures do not occur

	- buggy firmware will make host accesses painfully slow during
	scan (WD Green is very bad for this)

	- firmware does not implement useful subset of SMART command set

	- SMART command set can be inaccessible through some SATA bridge
	chips (especially USB)

	- cannot fix anything, only report quantities of data already lost

	- cannot reliably detect RAM or CPU failure (on host or drive)

	- requires the drive to spin for 1-2 continuous hours during test

	- interpreting the raw data is a black art

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux