On 2017年11月12日 04:12, Hans van Kranenburg wrote: > Hi, > > On 11/11/2017 04:48 AM, Qu Wenruo wrote: >> >> On 2017年11月11日 11:13, Hans van Kranenburg wrote: >>> On 11/11/2017 03:30 AM, Qu Wenruo wrote: >>>> >>> >>>> One more chance to recover is never a bad idea. >>> >>> It is a bad idea. The *only* case you can recover from is when you >>> freeze the filesystem *directly* after writing the superblock. Only in >>> that case you have both a consistent last committed and previous >>> transaction on disk. >> >> You're talking about the ideal case. >> >> The truth is, we're living in a real world where every software has >> bugs. And that's why sometimes we get transid error. >> >> So keeps the backup root still makes sense. >> >> And further more, different trees have different update frequency. >> For root and extent tree, they get updated every transaction, while for >> chunk tree it's seldom updated. >> >> And backup roots are updated per transaction, which means we may have a >> high chance to recover at least chunk root and to know the chunk map and >> possible to grab some data. > > That's entirely right yes. But "possible to grab some data" is a > whole different thing than "getting the filesystem back into a fully > functional consistent state..." > > So it's about expectation management for end users. If the user > thinks "Ha! A backup! That's nice of btrfs, it keeps them so I can go > back!.", then the user will get disappointed when the backups are unusable. Without discard, user should be able to rollback to previous transaction (backup_root[0]) The last transaction committed with commit_root and root->node switched, and as I stated in previous mail, until this swtich, commit_root must be fully available. And after the last transaction there is no modification (since the last trans is for unmount), so backuproot[0] should be fully accessible. Discard can break it unless we have method to trace tree block space usage for at least 2 transactions. > > The design of btrfs is that all metadata tree blocks and data extent > space that is not used by the last completed transaction are freed to be > reused, as soon as possible. For cow-only roots (e.g. root tree, extent > tree) this is already done immediately in the transaction code after > writing the super block (btrfs_finish_extent_commit, discard is also > immediately triggered), and for reference counted roots (subvolume > roots) the cleaner will asap do it. > > So, the design gives zero guarantee that following a backup root will > work. But, it's better than nothing when trying to scrape some data off > of the borken filesystem. Again, only for discard. > > Maybe it's enough to change man 5 btrfs with the mount options with > a warning for the usebackuproot option to let the user know that doing > this might result in a mountable filesystem, but that even in case it > does, the result should only be used to get as much data as possible off > of it before doing mkfs again. Or, if it succeeds, and if also umounting > again and running a full btrfsck and scrub to check all metadata and > data succeeds, the user might be pretty confident that nothing > referenced by the previous backuproot was already overwritten with new > data, in which case the filesystem can be continued to be used. > > But it puts usebackuproot very much in the same department where tools > like btrfs-restore live. Isn't it the original design? No one sane would use it for daily usage and it's original called "recovery", I don't see any problem here. Thanks, Qu > >>> If you do new writes and then again are able to mount with -o >>> usebackuproot and if any of the >>> transaction-before-the-last-committed-transaction blocks are overwritten >>> you're in a field of land mines and time bombs. Being able to mount >>> gives a false sense of recovery to the user in that case, because either >>> you're gonna crash into transid problems for metadata, or there are >>> files in the filesystem in which different data shows up than should, >>> potentially allowing users to see data from other users etc... It's just >>> dangerous. >>> >>>> As you can see, if metadata CoW is completely implemented as designed, >>>> there will be no report of transid mismatch at all. >>>> And btrfs should be bullet proof from the very beginning, but none of >>>> these is true. >>> >>> It is, it's not a bug. This is about the backup roots thingie, not about >>> the data from the last transaction. >> >> Check the original post. >> It only gives the magic number, it's not saying if it's from backup root. >> >> If it's dumped from running fs (it's completely possible) then it's the >> problem I described. >> >> Anyway, no matter what you think if it's a bug or not, I'll enhance tree >> allocator to do extra check if the result overwrites the commit root. >> >> And I strongly suspect transid related problems reported from mail list >> has something to do with it. >
Attachment:
signature.asc
Description: OpenPGP digital signature
