Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 28, 2016 at 07:32:38PM +0100, Goffredo Baroncelli wrote:
> On 2016-11-28 04:37, Christoph Anton Mitterer wrote:
> > I think for safety it's best to repair as early as possible (and thus
> > on read when a damage is detected), as further  blocks/devices may fail
> > till eventually a scrub(with repair) would be run manually.
> > 
> > However, there may some workloads under which such auto-repair is
> > undesirable as it may cost performance and safety may be less important
> > than that.
> 
> I am assuming that a corruption is a quite rare event. So occasionally
> it could happens that a page is corrupted and the system corrects
> it. This shouldn't  have an impact on the workloads.

Depends heavily on the specifics of the failure case.  If a drive's
embedded controller RAM fails, you get corruption on the majority of
reads from a single disk, and most writes will be corrupted (even if they
were not before).  If there's a transient failure due to environmental
issues (e.g. short-term high-amplitude vibration or overheating) then
writes may pause for mechanical retry loops.  If there is bitrot in SSDs
(particularly in the address translation tables) it looks like a wall
of random noise that only ends when the disk goes offline.  You can get
combinations of these (e.g. RAM failures caused by transient overheating)
where the drive's behavior changes over time.

When in doubt, don't write.

> BR
> G.Baroncelli
> -- 
> gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
> Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5
> 

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux