On Mon, May 04, 2020 at 08:22:24PM -0600, Chris Murphy wrote: > On Mon, May 4, 2020 at 8:00 PM Zygo Blaxell > <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > On Mon, May 04, 2020 at 05:24:11PM -0600, Chris Murphy wrote: > > > On Mon, May 4, 2020 at 5:09 PM Zygo Blaxell > > > <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > > > Some kinds of RAID rebuild don't provide sufficient idle time to complete > > > > the CMR-to-SMR writeback, so the host gets throttled. If the drive slows > > > > down too much, the kernel times out on IO, and reports that the drive > > > > has failed. The RAID system running on top thinks the drive is faulty > > > > (a false positive failure) and the fun begins (hope you don't have two > > > > of these drives in the same array!). > > > > > > This came up on linux-raid@ list today also, and someone posted this > > > smartmontools bug. > > > https://www.smartmontools.org/ticket/1313 > > > > > > It notes in part this error, which is not a time out. > > > > Uhhh...wow. If that's not an individual broken disk, but the programmed > > behavior of the firmware, that would mean the drive model is not usable > > at all. > > I haven't gone looking for a spec, but "sector ID not found" makes me > think of a trim/remap related failure, which, yeah it's gotta be a > firmware bug. This can't be "works as designed". Usually IDNF is "I was looking for a sector, but I couldn't figure out where on the disk it was," i.e. head positioning error or damage to the metadata on a cylinder or sector header. Though there are maybe some that return IDNF instead of ABRT when they get a request for a sector outside of the drive's legal LBA range. The "didn't find a sector" variant usually indicates non-trivial damage (impact on platter vs. bit fade), but could also be due to too much vibration and a short read error timeout. Also a small fraction of bit errors will land on sector headers and produce IDNF without other damage. > > -- > Chris Murphy
