On Mon, May 4, 2020 at 8:00 PM Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote: > > On Mon, May 04, 2020 at 05:24:11PM -0600, Chris Murphy wrote: > > On Mon, May 4, 2020 at 5:09 PM Zygo Blaxell > > <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote: > > > > > Some kinds of RAID rebuild don't provide sufficient idle time to complete > > > the CMR-to-SMR writeback, so the host gets throttled. If the drive slows > > > down too much, the kernel times out on IO, and reports that the drive > > > has failed. The RAID system running on top thinks the drive is faulty > > > (a false positive failure) and the fun begins (hope you don't have two > > > of these drives in the same array!). > > > > This came up on linux-raid@ list today also, and someone posted this > > smartmontools bug. > > https://www.smartmontools.org/ticket/1313 > > > > It notes in part this error, which is not a time out. > > Uhhh...wow. If that's not an individual broken disk, but the programmed > behavior of the firmware, that would mean the drive model is not usable > at all. I haven't gone looking for a spec, but "sector ID not found" makes me think of a trim/remap related failure, which, yeah it's gotta be a firmware bug. This can't be "works as designed". -- Chris Murphy
