Re: interest in post-mortem examination of a BTRFS system and improving the btrfs-code?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





2019-04-02 15:24, Qu Wenruo:


On 2019/4/2 下午9:06, Nik. wrote:

2019-04-02 02:24, Qu Wenruo:

On 2019/4/1 上午2:44, btrfs@xxxxxxxxxxxxx wrote:
Dear all,


I am a big fan of btrfs, and I am using it since 2013 - in the meantime
on at least four different computers. During this time, I suffered at
least four bad btrfs-failures leading to unmountable, unreadable and
unrecoverable file system. Since in three of the cases I did not manage
to recover even a single file, I am beginning to lose my confidence in
btrfs: for 35-years working with different computers no other file
system was so bad at recovering files!

Considering the importance of btrfs and keeping in mind the number of
similar failures, described in countless forums on the net, I have got
an idea: to donate my last two damaged filesystems for investigation
purposes and thus hopefully contribute to the improvement of btrfs. One
condition: any recovered personal data (mostly pictures and audio files)
should remain undisclosed and be deleted.

Should anybody be interested in this - feel free to contact me
personally (I am not reading the list regularly!), otherwise I am going
to reformat and reuse both systems in two weeks from today.

Some more info:

    - The smaller system is 83.6GB, I could either send you an image of
this system on an unneeded hard drive or put it into a dedicated
computer and give you root rights and ssh-access to it (the network link
is 100Mb down, 50Mb up, so it should be acceptable).

I'm a little more interested in this case, as it's easier to debug.

However there is one requirement before debugging.

*NO* btrfs check --repair/--init-* run at all.
btrfs check --repair is known to cause transid error.

unfortunately, this file system was used as testbed and even
"btrfs check --repair --check-data-csum --init-csum-tree --init-extent
tree ..." was attempted on it.
So I assume you are not interested.

Then the fs can be further corrupted, so I'm not interested.


On the larger file system only "btrfs check --repair --readonly ..." was
attempted (without success; most command executions were documented, so
the results can be made available), no writing commands were issued.

--repair will cause write, unless it even failed to open the filesystem.

If that's the case, it would be pretty interesting for me to poking
around the fs, and obviously, all read-only.


And, I'm afraid even with some debugging, the result would be pretty
predictable.

I do not need anything from the smaller file system and have (hopefully
fresh enough) backups from the bigger one.
I would be good enough if it helps to find any bugs, which are still in
the code.

It will be 90% transid error.
And if it's tree block from future, then it's something barrier related.
If it's tree block from the past, then it's some tree block doesn't
reach disk.

We have being chasing the spectre for a long time, had several
assumption but never pinned it down.

IMHO spectre would lead to much bigger loses - at least in my case it
could have happened all four times, but it did not.

But anyway, more info is always better.

I'd like to get the ssh access for this smaller image.

If you are still interested, please advise how to create the image of
the file system.

If the larger fs really doesn't get any write (btrfs check --repair
failed to open the fs, thus have the output "cannot open file system"),
I'm interesting in that one.

This is excerpt from the terminal log:
"# btrfs check --readonly /dev/md0
incorrect offsets 15003 146075
ERROR: cannot open file system
#"

Btw., since the list does allow _plain_text_only, I wonder how do you quote?

If not, then no.

I can imagine that it is preferable to use the
original, but in my case it is a (not mounted) partition of a bigger
hard drive, and the other partitions are in use. The "btrfs-image" seems
inappropriate to me, "dd" will probably screw things up?

Since the fs is too large, I don't think either way is good enough.

So in this case, the best way for me to poke around is to give me a
caged container with only read access to the larger fs.

I am afraid that this machine is too weak for using containers on it (QNAP SS839Pro NAS, Intel Atom, 2GB RAM), and right now I do not have other machine, which could accommodate five hard drives. Let me consider how to organize this or give another idea. One way could be "async ssh" - a private ssl-chat on one of my servers, so that you can write your commands there, I execute them on the machine as soon as I can and put the output back into the chat-window? Sounds silly, but could start immediately, and I have no better idea right now, sorry!

Thank you for trying to improve btrfs!

Nik.

Thanks,
Qu

You are not from the 007 - lab, are you? ;-)


Kind regards,

Nik.




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux