Re: Massive filesystem corruption since kernel 5.2 (ARCH)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 29.08.19 um 15:11 schrieb Qu Wenruo:
> 
> 
> On 2019/8/29 下午8:46, Oliver Freyermuth wrote:
>> Am 27.08.19 um 14:40 schrieb Hans van Kranenburg:
>>> On 8/27/19 11:14 AM, Swâmi Petaramesh wrote:
>>>> On 8/27/19 8:52 AM, Qu Wenruo wrote:
>>>>>> or to use the V2 space
>>>>>> cache generally speaking, on any machine that I use (I had understood it
>>>>>> was useful only on multi-TB filesystems...)
>>>>> 10GiB is enough to create large enough block groups to utilize free
>>>>> space cache.
>>>>> So you can't really escape from free space cache.
>>>>
>>>> I meant that I had understood that the V2 space cache was preferable to
>>>> V1 only for multi-TB filesystems.
>>>>
>>>> So would you advise to use V2 space cache also for filesystems < 1 TB ?
>>>
>>> Yes.
>>>
>>
>> This makes me wonder if it should be the default?
> 
> It will be.
> 
> Just a spoiler, I believe features like no-holes and v2 space cache will
> be default in not so far future.
> 
>>
>> This thread made me check on my various BTRFS volumes and for almost all of them (in different machines), I find cases of
>>  failed to load free space cache for block group XXXX, rebuilding it now
>> at several points during the last months in my syslogs - and that's for machines without broken memory, for disks for which FUA should be working fine,
>> without any unsafe shutdowns over their lifetime, and with histories as short as only having seen 5.x kernels.
> 
> That's interesting. In theory that shouldn't happen, especially without
> unsafe shutdown.

I also forgot to add that in addition on the machines there is no mdraid / dm / LUKS in between (i.e. purely btrfs on the drives). 
The messages _seem_ to be more prominent for spinning disks, but after all, my statistics is just 5 devices in total. 
So it really "feels" like a bug crawling somewhere. However, the machines seem to not have not seen any actual corruption as consequence. 
I'm playing with "btrfs check --readonly" now to see if there's really everything still fine, but I'm already running kernel 5.2 with the new checks without issues. 

> But please also be aware that, there is no concrete proof that corrupted
> v1 space cache is causing all the problems.
> What I said is just, corrupted v1 space cache may cause problem, I need
> to at least craft an image to proof my assumption.

I see - that might be useful in any case to hopefully track down the issue. 

> 
>>
>> So if this may cause harmful side effects, happens without clear origin, and v2 is safer due to being CoW,
>> I guess I should switch all my nodes to v2 (or this should become the default in a future kernel?).
> 
> At least, your experience would definitely help the btrfs community.

Ok, then I will slowly switch the nodes one by one - in case I do not come and cry on the list, this means all is well (but I'm only a small datapoint with 5 disks in three machines) ;-). 

Cheers,
	Oliver

> 
> Thanks,
> Qu
> 
>>
>> Cheers,
>> 	Oliver
>>



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux