On Tue, Jan 19, 2016 at 6:28 AM, Rian Hunter <rian@xxxxxxxxx> wrote: > In my raid6 setup, a disk was soft-failing on me. I pulled the disk, > inserted a new one, mounted degraded, then did btrfs-replace while running > some RW jobs on the FS. > > My jobs were taking too long. It seems like raid6 btrfs-replace without the > source disk is not very fast. So I unmounted the FS, inserted the > soft-failing disk again, remounted normally (non-degraded) and restarted the > (now much faster) btrfs-replace. > > I checked on the status sometime later and there were hundreds if not > thousands of "transid verify failure" messages in my dmesg. Additionally the > btrfs-replace operation had outright failed. > I think the bottom line here is the same thing that everybody runs into when they have LVM snapshots of btrfs devices. Btrfs doesn't record any kind of timestamp or generation number of anything like that when it touches data on a drive, or if it does it isn't granular enough or it isn't being used when mounting drives to ensure consistency. So, btrfs sees a bunch of partitions that at one time were consistent and assumes that they must still be consistent. If they aren't consistent, almost anything can go wrong. The bottom line is that anytime you mount btrfs in anything other than read-only mode, any partitions that may have been associated with that filesystem which aren't mounted should be treated like the plague. Don't let btrfs ever mount them again, because it will be all too willing to try and you'll likely lose data if it happens. There was a long thread on this not long ago and I don't know what the final outcome of it was. It would be ideal if btrfs could detect if a partition is out-of-date and not attempt to mount it, except perhaps as read-only in some kind of recovery mode. -- Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
