Re: interest in post-mortem examination of a BTRFS system and improving the btrfs-code?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/4/3 上午5:22, Nik. wrote:
> 
> 2019-04-02 16:12, Qu Wenruo:
>>
>>
>> On 2019/4/2 下午9:59, Nik. wrote:
>>>
>>>
>>> 2019-04-02 15:24, Qu Wenruo:
>>>>
>>>>
>>>> On 2019/4/2 下午9:06, Nik. wrote:
>>>>>
>>>>> 2019-04-02 02:24, Qu Wenruo:
>>>>>>
>>>>>> On 2019/4/1 上午2:44, btrfs@xxxxxxxxxxxxx wrote:
>>>>>>> Dear all,
>>>>>>>
>>>>>>>
>>>>>>> I am a big fan of btrfs, and I am using it since 2013 - in the
>>>>>>> meantime
>>>>>>> on at least four different computers. During this time, I
>>>>>>> suffered at
>>>>>>> least four bad btrfs-failures leading to unmountable, unreadable and
>>>>>>> unrecoverable file system. Since in three of the cases I did not
>>>>>>> manage
>>>>>>> to recover even a single file, I am beginning to lose my
>>>>>>> confidence in
>>>>>>> btrfs: for 35-years working with different computers no other file
>>>>>>> system was so bad at recovering files!
>>>>>>>
>>>>>>> Considering the importance of btrfs and keeping in mind the
>>>>>>> number of
>>>>>>> similar failures, described in countless forums on the net, I
>>>>>>> have got
>>>>>>> an idea: to donate my last two damaged filesystems for investigation
>>>>>>> purposes and thus hopefully contribute to the improvement of btrfs.
>>>>>>> One
>>>>>>> condition: any recovered personal data (mostly pictures and audio
>>>>>>> files)
>>>>>>> should remain undisclosed and be deleted.
>>>>>>>
>>>>>>> Should anybody be interested in this - feel free to contact me
>>>>>>> personally (I am not reading the list regularly!), otherwise I am
>>>>>>> going
>>>>>>> to reformat and reuse both systems in two weeks from today.
>>>>>>>
>>>>>>> Some more info:
>>>>>>>
>>>>>>>      - The smaller system is 83.6GB, I could either send you an
>>>>>>> image of
>>>>>>> this system on an unneeded hard drive or put it into a dedicated
>>>>>>> computer and give you root rights and ssh-access to it (the network
>>>>>>> link
>>>>>>> is 100Mb down, 50Mb up, so it should be acceptable).
>>>>>>
>>>>>> I'm a little more interested in this case, as it's easier to debug.
>>>>>>
>>>>>> However there is one requirement before debugging.
>>>>>>
>>>>>> *NO* btrfs check --repair/--init-* run at all.
>>>>>> btrfs check --repair is known to cause transid error.
>>>>>
>>>>> unfortunately, this file system was used as testbed and even
>>>>> "btrfs check --repair --check-data-csum --init-csum-tree --init-extent
>>>>> tree ..." was attempted on it.
>>>>> So I assume you are not interested.
>>>>
>>>> Then the fs can be further corrupted, so I'm not interested.
>>>>
>>>>>
>>>>> On the larger file system only "btrfs check --repair --readonly
>>>>> ..." was
>>>>> attempted (without success; most command executions were
>>>>> documented, so
>>>>> the results can be made available), no writing commands were issued.
>>>>
>>>> --repair will cause write, unless it even failed to open the
>>>> filesystem.
>>>>
>>>> If that's the case, it would be pretty interesting for me to poking
>>>> around the fs, and obviously, all read-only.
>>>>
>>>>>
>>>>>> And, I'm afraid even with some debugging, the result would be pretty
>>>>>> predictable.
>>>>>
>>>>> I do not need anything from the smaller file system and have
>>>>> (hopefully
>>>>> fresh enough) backups from the bigger one.
>>>>> I would be good enough if it helps to find any bugs, which are
>>>>> still in
>>>>> the code.
>>>>>
>>>>>> It will be 90% transid error.
>>>>>> And if it's tree block from future, then it's something barrier
>>>>>> related.
>>>>>> If it's tree block from the past, then it's some tree block doesn't
>>>>>> reach disk.
>>>>>>
>>>>>> We have being chasing the spectre for a long time, had several
>>>>>> assumption but never pinned it down.
>>>>>
>>>>> IMHO spectre would lead to much bigger loses - at least in my case it
>>>>> could have happened all four times, but it did not.
>>>>>
>>>>>> But anyway, more info is always better.
>>>>>>
>>>>>> I'd like to get the ssh access for this smaller image.
>>>>>
>>>>> If you are still interested, please advise how to create the image of
>>>>> the file system.
>>>>
>>>> If the larger fs really doesn't get any write (btrfs check --repair
>>>> failed to open the fs, thus have the output "cannot open file system"),
>>>> I'm interesting in that one.
>>>
>>> This is excerpt from the terminal log:
>>> "# btrfs check --readonly /dev/md0
>>> incorrect offsets 15003 146075
>>> ERROR: cannot open file system
>>> #"
>>
>> That's great.
>>
>> And to my surprise, this is completely different problem.
>>
>> And I believe, it will be detected by latest write time tree checker
>> patches in next kernel release.
> 
> Is the next release going to come out in April?

Next release is v5.1, which doesn't contain all my recent tree-checker
enhancement.

So I'm afraid you need to wait for June.

> 
>> This problem is normally caused by memory bit flip.
> 
> Well, this system has suffered many power outages (at least 6 since
> 2013), and after each outage I had to run scrub AND nevertheless
> discovered the loss of a couple of files. I can imagine, that the power
> supply or the mother board of this machine is not (well) designed for
> reliability, but:

Unless the PSU is so unreliable so that the VRM for memory or memory
controller doesn't get needed voltage, power outage is not related to
this case.

>   1) shouldn't the file system be immune to this?

If memory is corrupted, nothing can help, unless you have ECC memory.

>   2) Isn't is too stupid to lose terabytes of information due to a
> flipped bit?

Depends on where the bit flip is.
If the bit flip happens at super vital tree block, like chunk tree, root
tree, then the whole fs is unable to be mounted.

Although enhanced tree-checker will be able to detect such problem and
abort write before corrupted data reach disk.
So at least with those enhancement, it should not cause such problem at all.

> The same machine has ext4 and FAT file systems, and they never have had
> a problem or recovered automatically by means of fsck during the next
> reboot!

Then we should enhance btrfs-progs to detect bit flip.

Thanks,
Qu

> 
>> This should ring a little alert about the problem.
>>
>> Anyway, v5.2 or v5.3 kernel would be much better to catch such problems.
> 
> This kernel isn't even scheduled, is it? Well, I am not really in a
> hurry...
> 
>>>
>>> Btw., since the list does allow _plain_text_only, I wonder how do you
>>> quote?
>>>
>>>> If not, then no.
>>>>
>>>>> I can imagine that it is preferable to use the
>>>>> original, but in my case it is a (not mounted) partition of a bigger
>>>>> hard drive, and the other partitions are in use. The "btrfs-image"
>>>>> seems
>>>>> inappropriate to me, "dd" will probably screw things up?
>>>>
>>>> Since the fs is too large, I don't think either way is good enough.
>>>>
>>>> So in this case, the best way for me to poke around is to give me a
>>>> caged container with only read access to the larger fs.
>>>
>>> I am afraid that this machine is too weak for using containers on it
>>> (QNAP SS839Pro NAS, Intel Atom, 2GB RAM), and right now I do not have
>>> other machine, which could accommodate five hard drives. Let me consider
>>> how to organize this or give another idea. One way could be "async ssh"
>>> -  a private ssl-chat on one of my servers, so that you can write your
>>> commands there, I execute them on the machine as soon as I can and put
>>> the output back into the chat-window? Sounds silly, but could start
>>> immediately, and I have no better idea right now, sorry!
>>
>> Your btrfs check output is already good enough to locate the problem.
>>
>> The next thing would be just to help you recovery that image if that's
>> what you need.
> 
> Well, let me say it again: 1) I have a backup, but one is never sure
> which newest files are not in it. 2) It is much more important to be
> sure that the btrfs code is flawless and no other btrfs file system is
> in danger! I can live with some loses, but inability to recover even
> single file is not acceptable!
> 
>> The purposed idea is not that uncommon. In fact it's just another way of
>> "show commands, user execute and report, developer check the output"
>> loop.
>>
>> In your case, you just need latest btrfs-progs and re-run "btrfs check
>> --readonly" on it.
> 
> Will try this, but have no time before tomorrow evening.
> 
> 
>> If it just shows the same result, meaning I can't get the info about
>> which tree block is corrupted, then you could try to mount it with -o ro
>> using *LATEST* kernel.
> 
> I tried this before with the 4.15.0-46 kernel, it was impossible. Will
> try again with newer ona as soon as possible (in best case tomorrow
> evening); I will post the results.
> 
>> Latest kernel will report anything wrong pretty vocally, in that case,
>> dmesg would include the bytenr of corrupted tree block.
>>
>> Then I could craft needed commands to further debug the fs.
> 
> Ok, I will try to post more info tomorrow about this time.
> 
> Nik.
> -- 
> 
>> Thanks,
>> Qu
>>
>>>
>>> Thank you for trying to improve btrfs!
>>>
>>> Nik.
>>>>
>>>> Thanks,
>>>> Qu
>>>
>>> You are not from the 007 - lab, are you? ;-)
>>>
>>>>>
>>>>> Kind regards,
>>>>>
>>>>> Nik.
>>>>
>>

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux