Re: Is it logical to use a disk that scrub fails but smartctl succeeds?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Dec 16, 2019 at 3:36 AM Cerem Cem ASLAN <ceremcem@xxxxxxxxxxxx> wrote:
>
> > smartctl -l scterc /dev/
>
> SCT Error Recovery Control:
>            Read: Disabled
>           Write: Disabled

For daily production use I recommend changing both to 7 seconds, it's
possible to setup a udev rule for this so it's always in place for
specific drives by /dev/by whatever you want, wwn or serial or label.
Whereas /dev/sda /dev/sdb is not always reliably assigned during
startup.

The logic is that it's better to have quick failures. These produce
discrete errors, with the affected LBA for the sector, and Btrfs can
act on this with self-healing, whether it's an ordinary read, or a
scrub. Self-healing does require redundancy. But even with single copy
data, you'll get a path to file reference for the affected file. It's
often easier to just delete that file and copy it from backup.

Whereas with ERC disabled, it's uncertain what the error timeout is.
With consumer drives, so-called "deep recovery" is possible which can
take an extraordinary amount of time, and manifests as storage stack
slow down. But by default the kernel's SCSI block layer has a command
timer of its own, by default 30 seconds. If a command hasn't completed
in 30 seconds, this kernel command timer will try to reset the device.
Upon reset, the entire command queue is lost on SATA drives; on SAS
drives just that delayed command is excised, but in either case, it's
never discovered what sector is causing the delay. Essentially the
real problem gets masked by the reset.

The end result is that it's possible for bad sectors to just get worse
and worse (slower and slower recovery) until the data on them is lost
for good, and in the meantime the storage stack gets hung up on these
slow read delays as the drive firmware keeps retrying to read from
marginal sectors. There might be a reasonable use case for long
recoveries, e.g. a boot drive, with single copy data and metadata,
where it's better to have slow downs than to have EIO blow things up
in a non-obvious way. I personally would still favor short recovery
below 30 seconds, and that way I'll see a discrete drive read error
along with the blow up, and make the connection. Whereas slow downs
have no log entries until there's a link reset by the kernel.

Also, 7 seconds comes from what I typically see from NAS and
enterprise drives. So it's not a random pick, but other values are
sane as well as long as SCT ERC is less than the SCSI command timer
value (which is per block device, it is not a setting in the device,
it is a kernel setting, found in /sys )

Bit older reference but is still valid across Linux distros
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/online_storage_reconfiguration_guide/task_controlling-scsi-command-timer-onlining-devices


>
> It seems like the drive has STC ERC support but disabled. However some
> weird error is thrown with your correct syntax:
>
> =======> INVALID ARGUMENT TO -l: scterc,1800,70
>
> It's an interesting approach to setup long read time windows. I'll
> keep this in mind even though this time I'm determined to make the
> correct setup that will make such a data scraping job unnecessary.

It could be a firmware bug *shrug* try something else like:

-l scterc,1200,1200

Maybe it wants them to be identical.


> First problem was that I "hoped" the machine would just crash with
> "DRDY ERR"s when the disk has *any* problems.

Right. So instead look through logs suggesting there have been link
resets (typically from libata but it depends on what drives you have,
what this error looks like exactly). Link resets prevent the drive
specific error from happening. Hence you want the drive's internal
firmware to give up on error recovery before the kernel gives up on
command delays.

More here which itself has a pile of links to this same issue
affecting md arrays.

https://raid.wiki.kernel.org/index.php/Timeout_Mismatch



-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux