Re: btrfs goes read-only when btrfs-cleaner runs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 16.01.19 um 08:11 schrieb Nikolay Borisov:
> 
> 
> On 16.01.19 г. 0:24 ч., Oliver Freyermuth wrote:
>> Am 14.01.19 um 01:48 schrieb Oliver Freyermuth:
>>> Am 13.01.19 um 22:51 schrieb Oliver Freyermuth:
>>>> I just upgraded to 4.20.1 from 4.19 (not sure if related) and my btrfs backup volume entered read-only mode when running btrfs-cleaner,
>>>> i.e. when purging old subvolumes. 
>>>>
>>>> I have attached the kernel log from when this happens. 
>>>>
>>>> What is the best way to proceed from here? Running "btrfs check repair" on the device? 
>>>> Worst case it's not a huge issue to lose the data stored there, it's my backup volume after all. 
>>>> But it would be good to understand the cause and know if there is a better fix than starting from scratch. 
>>> attached is the output of "btrfs check -p /dev/sdc2". 
>>> I can't guarantee the volume has never been cleanly unmounted. 
>>>
>>> I found several past occasions of this here:
>>> https://www.spinics.net/lists/linux-btrfs/msg69040.html
>>> and here:
>>> https://unix.stackexchange.com/questions/369133/dealing-with-btrfs-ref-backpointer-mismatches-backref-missing
>>> but without conclusive result. 
>>>
>>> Please let me know what's the best way to proceed. From these links, it seems
>>> btrfs check --repair
>>> _should_ help, but I would prefer to get some advice first whether this is really the best approach. 
>>>
>>
>> Dear BTRFS experts,
>>
>> I have now salvaged all my backup subvolumes with btrfs send (using btrbk archive) to a new btrfs partition. 
>> Interestingly, when the old partition was mounted r/w initially and remounted r/o after the described issue was triggered by btrfs-cleaner:
>>
>> [34758.491644] BTRFS: error (device sdc2) in __btrfs_free_extent:6828: errno=-2 No such entry                                                                                                                                               
>> [34758.491647] BTRFS info (device sdc2): forced readonly                                                                                                                                                                                     
>> [34758.491652] BTRFS: error (device sdc2) in btrfs_run_delayed_refs:2978: errno=-2 No such entry 
>>
> 
> You are likely hitting a known issue, you need to apply:
> 
> btrfs: run delayed items before dropping the snapshot, currently this
> patch is part of 5.0 but it has also been marked for stable so should
> land in some of the stable kernels. So you have 2 options:
> 
> 1. Backport the patch to the kernel you desire
> 2. Wait until the patch lands in a stable release.

Thanks a lot for the pointer! 
Sadly, it seems that was already in 4.20.1, which I am using:
https://lkml.org/lkml/2019/1/9/792

> 
>> btrfs send appeared to fail on some subvolumes with:
>>
>> [41822.676040] BTRFS error (device sdc2): parent transid verify failed on 52633681920 wanted 88063 found 87999                                                                                                                               
>> [41822.676260] BTRFS error (device sdc2): parent transid verify failed on 52633681920 wanted 88063 found 87999                                                                                                                               
>> [41822.676266] BTRFS info (device sdc2): no csum found for inode 22175978 start 0                                                                                                                                                           
>> [41822.683112] BTRFS warning (device sdc2): csum failed root 25758 ino 22175978 off 4427459514368 csum 0x5d3b8d26 expected csum 0x00000000 mirror 1 
>>
>> Unmounting and remounting the broken file system r/o, all visible subvolumes could be transferred without that issue. 
>> I presume that there's also a bug when the automatic remount as r/o happens since csum 0x00000000 does not look correct. 
>>
>> Since there's now nothing to lose and I received no other advice up to now, I'm running "btrfs check --repair" now just for the sake of learning
>> whether this appears to fix it. I'll shortly report back when that's done. 
> 
> --repair won't fix the problem, also it's possible it *could* make
> things worse.

Since repair did already run (and did not really help, but segfaults after trying some things) I guess the volume is hosed now anyways. 
It's still sad there is no clear explanation for the corruption - I still believe it *might* have been unmounted hard while btrfs-cleaner was running, though,
but I would hope that can not lead to a non-recoverable state (especially if "only" deleted / to-be-deleted subvolumes are affected). 

I doubt it's memory corruption, since the source is fine and it only happened for those deleted subvolumes immediately after rebooting from 4.19 to 4.20
(but I don't think the kernel version change was the reason, but rather the reboot during deletion which should have done a graceful unmount but might not have done so). 

I'll keep the volume around for a few more days in case somebody is interested to hunt down the cause, just let me know what is needed. 

Cheers,
	Oliver

> 
>>
>> If anybody can suggest a better solution in case this happens again (the issue appears to be wide-spread) I would be happy to learn. 
>>
>> Cheers,
>> 	Oliver
>>



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux