Am 16.01.19 um 08:11 schrieb Nikolay Borisov: > > > On 16.01.19 г. 0:24 ч., Oliver Freyermuth wrote: >> Am 14.01.19 um 01:48 schrieb Oliver Freyermuth: >>> Am 13.01.19 um 22:51 schrieb Oliver Freyermuth: >>>> I just upgraded to 4.20.1 from 4.19 (not sure if related) and my btrfs backup volume entered read-only mode when running btrfs-cleaner, >>>> i.e. when purging old subvolumes. >>>> >>>> I have attached the kernel log from when this happens. >>>> >>>> What is the best way to proceed from here? Running "btrfs check repair" on the device? >>>> Worst case it's not a huge issue to lose the data stored there, it's my backup volume after all. >>>> But it would be good to understand the cause and know if there is a better fix than starting from scratch. >>> attached is the output of "btrfs check -p /dev/sdc2". >>> I can't guarantee the volume has never been cleanly unmounted. >>> >>> I found several past occasions of this here: >>> https://www.spinics.net/lists/linux-btrfs/msg69040.html >>> and here: >>> https://unix.stackexchange.com/questions/369133/dealing-with-btrfs-ref-backpointer-mismatches-backref-missing >>> but without conclusive result. >>> >>> Please let me know what's the best way to proceed. From these links, it seems >>> btrfs check --repair >>> _should_ help, but I would prefer to get some advice first whether this is really the best approach. >>> >> >> Dear BTRFS experts, >> >> I have now salvaged all my backup subvolumes with btrfs send (using btrbk archive) to a new btrfs partition. >> Interestingly, when the old partition was mounted r/w initially and remounted r/o after the described issue was triggered by btrfs-cleaner: >> >> [34758.491644] BTRFS: error (device sdc2) in __btrfs_free_extent:6828: errno=-2 No such entry >> [34758.491647] BTRFS info (device sdc2): forced readonly >> [34758.491652] BTRFS: error (device sdc2) in btrfs_run_delayed_refs:2978: errno=-2 No such entry >> > > You are likely hitting a known issue, you need to apply: > > btrfs: run delayed items before dropping the snapshot, currently this > patch is part of 5.0 but it has also been marked for stable so should > land in some of the stable kernels. So you have 2 options: > > 1. Backport the patch to the kernel you desire > 2. Wait until the patch lands in a stable release. Thanks a lot for the pointer! Sadly, it seems that was already in 4.20.1, which I am using: https://lkml.org/lkml/2019/1/9/792 > >> btrfs send appeared to fail on some subvolumes with: >> >> [41822.676040] BTRFS error (device sdc2): parent transid verify failed on 52633681920 wanted 88063 found 87999 >> [41822.676260] BTRFS error (device sdc2): parent transid verify failed on 52633681920 wanted 88063 found 87999 >> [41822.676266] BTRFS info (device sdc2): no csum found for inode 22175978 start 0 >> [41822.683112] BTRFS warning (device sdc2): csum failed root 25758 ino 22175978 off 4427459514368 csum 0x5d3b8d26 expected csum 0x00000000 mirror 1 >> >> Unmounting and remounting the broken file system r/o, all visible subvolumes could be transferred without that issue. >> I presume that there's also a bug when the automatic remount as r/o happens since csum 0x00000000 does not look correct. >> >> Since there's now nothing to lose and I received no other advice up to now, I'm running "btrfs check --repair" now just for the sake of learning >> whether this appears to fix it. I'll shortly report back when that's done. > > --repair won't fix the problem, also it's possible it *could* make > things worse. Since repair did already run (and did not really help, but segfaults after trying some things) I guess the volume is hosed now anyways. It's still sad there is no clear explanation for the corruption - I still believe it *might* have been unmounted hard while btrfs-cleaner was running, though, but I would hope that can not lead to a non-recoverable state (especially if "only" deleted / to-be-deleted subvolumes are affected). I doubt it's memory corruption, since the source is fine and it only happened for those deleted subvolumes immediately after rebooting from 4.19 to 4.20 (but I don't think the kernel version change was the reason, but rather the reboot during deletion which should have done a graceful unmount but might not have done so). I'll keep the volume around for a few more days in case somebody is interested to hunt down the cause, just let me know what is needed. Cheers, Oliver > >> >> If anybody can suggest a better solution in case this happens again (the issue appears to be wide-spread) I would be happy to learn. >> >> Cheers, >> Oliver >>
