Re: Btrfs transid corruption

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hey Josef, et al.


First, many thanks for the quick help before. :-)


On Wed, 2020-04-01 at 16:40 -0400, Josef Bacik wrote:
> btrfs rescue zero-log /dev/whatever

This worked nicely and at the very first glance (though I haven't
diffed any of the data with backups nor did I run scrub, yet) it seems
to be mostly all there.


I have a number of questions though...


1) Could this be a bug?
Yes I know I had a freeze but, here's what happened.
- few days ago I've upgraded from 5.2 respectively 5.4 to 5.5.13
  the system ran already for one day without issues, "before" it
  suddenly froze, Magic-SysRq wasn't working and I had to power-off
- I then booted from a rescue USB stick with some kernel 5.4 and btrfs
  tools 5.4.1

- did a --mode=normal fsck of the fs, no errors !!

- then I did a --clear-space-cache v1
  Every now and then I see some free space warnings in the kernel log,
  and so I do clear the cache from time to time when I have the
  filesystem offline

- I didn't to another fsck directly afterwards unfortunately...
  if I had done (and saw errors already by then, we'd now know for sure
  there must be some bug)

- then I rebooted in the the normal system and there it failed to mount
  (i.e. the root fs)


So I mean I could understand that something would have gotten damaged
right after the freeze, but the fsck there seemed fine,...
Any ideas?




2) What's the tree log doing? Is it like kind of a journal? And
basically everything that was in it and not yet fully commited to the
fs is now lost?


3) Based on the generation (I assume 1453260 and 1452480 are generation
numbers?), can one tell how much data is lost, like in the sense of the
time span?
parent transid verify failed on 425230336 wanted 1453260 found 1452480

And can one tell what is pointed to by 425230336?


4) The open_ctree failed error one saw on the screenshot... was this
then just a follow up error from the failure of replaying the log?


5) Was some backup superblock used now and thus some further
data/metadata-lost?


And most importantly:


6) Can one expect now, that everything which is now there/seen is still
valid? Or could there be any file internal corruptions (respectively is
this likely or not)?

I mean this is what I'd more or less expect from a CoW fs... if it
crashes some data might be gone, but what's still there is 100% valid?


7) Am I advised to re-create the filesystem? Like could there be still
any hidden errors that fsck doesn't see and that sooner or later build
up and make it explode again?
Or is the whole thing just a minor issue and a well known/understood
clean up procedure from a previous freeze?

Setting it up again (with the recovery) would be just work (not that I
can access the data again)... so if it's advisable I'd rather go for
that.


8) Any other checks I could/should make, like scrub?



Thanks a lot,
Chris.




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux