Re: [PATCH] btrfs: raid56: Use correct stolen pages to calculate P/Q

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 25, 2016 at 03:40:36PM +1100, Gareth Pye wrote:
> On Fri, Nov 25, 2016 at 3:31 PM, Zygo Blaxell
> <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > This risk mitigation measure does rely on admins taking a machine in this
> > state down immediately, and also somehow knowing not to start a scrub
> > while their RAM is failing...which is kind of an annoying requirement
> > for the admin.
> 
> Attempting to detect if RAM is bad when scrub starts is both time
> consuming and not very reliable right.

RAM, like all hardware, could fail at any time, and a scrub could already
be running when it happens.  This is annoying but also a fact of life that
admins have to deal with.

Testing RAM before scrub starts is not more beneficial than testing RAM
at random intervals--but if you are testing RAM at random intervals,
why not do it at the same intervals as scrub?

If I see corruption errors showing up in stats, I will do a basic sanity
test to make sure they're coming from the storage layer and not somewhere
closer to the CPU.  If all errors come from one device and there are clear
log messages showing SCSI device errors and the SMART log matches the
other data, RAM is probably not the root case of failures, so scrub away.

If normally reliable programs like /bin/sh start randomly segfaulting,
there's smoke pouring out of the back of the machine, all the disks are
full of csum failures, and the BIOS welcome message has spelling errors
that weren't there before, I would *not* start a scrub.  More like
turn the machine off, take it apart, test all the pieces separately,
and only do a scrub after everything above the storage layer had been
replaced or recertified.  I certainly wouldn't want the filesystem to
try to fix the csum failures it finds in such situations.

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux