One item I did forget to mention here is that the underlying device was expanded online using "btrfs fi resize max /mount/path" at most a month before the failure -- I don't have the exact timestamps available, so there remains a possibility that the latest files on the currently mounted filesystem correspond to the filesystem as it was immediately prior to the resize operation. Again, any suggestions welcome. It took us a bit of time (and several large file restores) to realize the filesystem had rolled back vs just corrupted a few files, so while there does exist a raw copy of the filesystem it is tainted by being mounted and written to before the copy was taken. ----- Original Message ----- > From: "Timothy Pearson" <tpearson@xxxxxxxxxxxxxxxxxxxxx> > To: "linux-btrfs" <linux-btrfs@xxxxxxxxxxxxxxx> > Sent: Saturday, November 9, 2019 4:33:29 PM > Subject: Unusual crash -- data rolled back ~2 weeks? > We just experienced a very unusual crash on a Linux 5.3 file server using NFS to > serve a BTRFS filesystem. NFS went into deadlock (D wait) with no apparent > underlying disk subsystem problems, and when the server was hard rebooted to > clear the D wait the BTRFS filesystem remounted itself in the state that it was > in approximately two weeks earlier (!). There was also significant corruption > of certain files (e.g. LDAP MDB and MySQL InnoDB) noted -- we restored from > backup for those files, but are concerned about the status of the entire > filesystem at this point. > > We do not use subvolumes, snapshots, or any of the advanced features of BTRFS > beyond the data checksumming. I am at a loss as to how BTRFS could suddenly > just "forget" about the past two weeks of written data and (mostly) cleanly > roll back on the next mount without even throwing any warnings in dmesg. > > Any thoughts on how this is possible, and if there is any chance of getting the > lost couple weeks of data back, would be appreciated. > > Thank you!
