On Mon, Jun 20, 2016 at 1:11 PM, Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote: > On Mon, Jun 20, 2016 at 11:13:51PM +0500, Roman Mamedov wrote: >> On Sun, 19 Jun 2016 23:44:27 -0400 >> Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote: >> From a practical standpoint, [aside from not using Btrfs RAID5], you'd be >> better off shutting down the system, booting a rescue OS, copying the content >> of the failing disk to the replacement one using 'ddrescue', then removing the >> bad disk, and after boot up your main system wouldn't notice anything has ever >> happened, aside from a few recoverable CRC errors in the "holes" on the areas >> which ddrescue failed to copy. > > I'm aware of ddrescue and myrescue, but in this case the disk has failed, > past tense. At this point the remaining choices are to make btrfs native > raid5 recovery work, or to restore from backups. Seems difficult at best due to this: >>The normal 'device delete' operation got about 25% of the way in, then got stuck on some corrupted sectors and aborting with EIO. In effect it's like a 2 disk failure for a raid5 (or it's intermittently a 2 disk failure but always at least a 1 disk failure). That's not something md raid recovers from. Even manual recovery in such a case is far from certain. Perhaps Roman's advice is also a question about the cause of this corruption? I'm wondering this myself. That's the real problem here as I see it. Losing a drive is ordinary. Additional corruptions happening afterward is not. And are those corrupt sectors hardware corruptions, or Btrfs corruptions at the time the data was written to disk, or Btrfs being confused as it's reading the data from disk? For me the critical question is what does "some corrupted sectors" mean? -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
