On Mon, Dec 16, 2019 at 3:36 AM Cerem Cem ASLAN <ceremcem@xxxxxxxxxxxx> wrote: > > > smartctl -l scterc /dev/ > > SCT Error Recovery Control: > Read: Disabled > Write: Disabled For daily production use I recommend changing both to 7 seconds, it's possible to setup a udev rule for this so it's always in place for specific drives by /dev/by whatever you want, wwn or serial or label. Whereas /dev/sda /dev/sdb is not always reliably assigned during startup. The logic is that it's better to have quick failures. These produce discrete errors, with the affected LBA for the sector, and Btrfs can act on this with self-healing, whether it's an ordinary read, or a scrub. Self-healing does require redundancy. But even with single copy data, you'll get a path to file reference for the affected file. It's often easier to just delete that file and copy it from backup. Whereas with ERC disabled, it's uncertain what the error timeout is. With consumer drives, so-called "deep recovery" is possible which can take an extraordinary amount of time, and manifests as storage stack slow down. But by default the kernel's SCSI block layer has a command timer of its own, by default 30 seconds. If a command hasn't completed in 30 seconds, this kernel command timer will try to reset the device. Upon reset, the entire command queue is lost on SATA drives; on SAS drives just that delayed command is excised, but in either case, it's never discovered what sector is causing the delay. Essentially the real problem gets masked by the reset. The end result is that it's possible for bad sectors to just get worse and worse (slower and slower recovery) until the data on them is lost for good, and in the meantime the storage stack gets hung up on these slow read delays as the drive firmware keeps retrying to read from marginal sectors. There might be a reasonable use case for long recoveries, e.g. a boot drive, with single copy data and metadata, where it's better to have slow downs than to have EIO blow things up in a non-obvious way. I personally would still favor short recovery below 30 seconds, and that way I'll see a discrete drive read error along with the blow up, and make the connection. Whereas slow downs have no log entries until there's a link reset by the kernel. Also, 7 seconds comes from what I typically see from NAS and enterprise drives. So it's not a random pick, but other values are sane as well as long as SCT ERC is less than the SCSI command timer value (which is per block device, it is not a setting in the device, it is a kernel setting, found in /sys ) Bit older reference but is still valid across Linux distros https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/5/html/online_storage_reconfiguration_guide/task_controlling-scsi-command-timer-onlining-devices > > It seems like the drive has STC ERC support but disabled. However some > weird error is thrown with your correct syntax: > > =======> INVALID ARGUMENT TO -l: scterc,1800,70 > > It's an interesting approach to setup long read time windows. I'll > keep this in mind even though this time I'm determined to make the > correct setup that will make such a data scraping job unnecessary. It could be a firmware bug *shrug* try something else like: -l scterc,1200,1200 Maybe it wants them to be identical. > First problem was that I "hoped" the machine would just crash with > "DRDY ERR"s when the disk has *any* problems. Right. So instead look through logs suggesting there have been link resets (typically from libata but it depends on what drives you have, what this error looks like exactly). Link resets prevent the drive specific error from happening. Hence you want the drive's internal firmware to give up on error recovery before the kernel gives up on command delays. More here which itself has a pile of links to this same issue affecting md arrays. https://raid.wiki.kernel.org/index.php/Timeout_Mismatch -- Chris Murphy
