On Fri, Jun 19, 2020 at 02:31:48PM +0500, Roman Mamedov wrote: > On Fri, 19 Jun 2020 10:08:43 +0200 > Daniel Smedegaard Buus <danielbuus@xxxxxxxxx> wrote: > > > Well, that's why I wrote having the *data* go bad, not the drive > > But data going bad wouldn't pass unnoticed like that (with reads resulting in > bad data), since drives have end-to-end CRC checking, including on-disk and > through the SATA interface. Some bespoke SAN drives have proprietary firmware and wire protocols that pass the CRC data in-band from the platter to the host for verification (the READ and WRITE commands carry extra bytes for the CRC, so a disk sector is 520 or 4104 bytes long). This is a true end-to-end CRC check, but this is not a complete data integrity solution because it only contributes protection against data corruption while the data is inside the disk controller. It has no impact on any of the other silent data corruption failure modes. If your drive just passes 512 or 4096-byte sectors to the host, then there is no end-to-end checking. It is only piecemeal, partial coverage of individual segments in the data path, with no way to detect corruption at points in between. > If data on-disk is somehow corrupted, that will be > a CRC failure on read, and still an I/O error for the host. In my data set, about 1 in 20 failing disks silently corrupt some data without indicating the data is bad. No disk is immune to this kind of failure, from cheap consumer SSDs to enterprise HDDs with bespoke firmware for proprietary SAN boxes. Failing drives do not respect the boundaries of expected non-failing drive behavior. About a third of silent data corruptions in spinning disks were DRAM failures. SSDs and HDDs use DRAM in their embedded controller boards, and that DRAM fails at the same rate as any other commercially available DRAM. ECC RAM in disk controllers is the most expensive and least effective way to improve data integrity in the storage stack, so no rational vendor offers it. Another third of the data errors are failures related to write caching. In these failures the contents of the write cache will be discarded after the data was reported flushed, and later reads to discarded sectors will return old data. This event can be triggered by several different causes, depending on what faults the firmware can detect and recover from and what bugs are present in the firmware. These failures share a defining characteristic: they can be prevented by disabling write cache. The remaining third are assorted bugs (botched UNC sector remappings, write to wrong track, "magic" LBA bugs, firmware recalls, bad SSDs, bad cables, bad power, misconfigured bus timeout/SCTERC settings, and mishandled bus resets) or some uncategorizable mix of multiple simultaneous failure modes. Some of these are coincident with other indicators of failure (e.g. unexpected SATA bus timeouts or resets), but not IO errors during read or write operations to the specific sectors that are corrupted. Some of these are not drive failures per se, but failures in adjacent parts of the system that cause the drive to operate improperly, corrupting data and suppressing error reports. The other 19 out of 20 failing drives report IO errors as expected, or fail to spin up at all. Those failure cases are trivial. Even mdadm handles them easily. > I only heard of some bad SSDs (SiliconMotion-based) returning corrupted data > as if nothing happened, and only when their flash lifespan is close to > depletion. Kingston and Sandisk SSDs silently corrupt data starting as early as 20% of rated TBW. After some experimenting with them, I don't believe their firmware is capable of detecting data integrity errors at any point in their lifespan. You can put a btrfs on one of these SSDs with DUP data and DUP metadata, and watch it play whack-a-mole as it self-repairs the csum errors that pop up all over the filesystem, until eventually the SSD dies. > > even though either scenario should still effectively end up yielding the > > same behavior from btrfs > > I believe that's also an assumption you'd want to test, if you want to be > through in verifying its behavior on failures or corruptions. And anyways it's > better to set up a scenario which is as close as possible to ones you'd get in > real-life. > > > But check out my retraction reply from earlier — it was just me being stupid > > and forgetting to use conv=notrunc on my dd command used to damage the > > loopback file :) > > Sure, I only commented on the part where it still made sense. :) > > -- > With respect, > Roman
