Re: btrfs: obtain block checksums from user space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2015-09-24 14:48, Matwey V. Kornilov wrote:
2015-09-24 21:35 GMT+03:00 Austin S Hemmelgarn <ahferroin7@xxxxxxxxx>:
On 2015-09-24 14:06, Matwey V. Kornilov wrote:


Hello,

I would like to read the list of the checksums for the specific file
stored onto btrfs filesystem. I think I could use the checksums in the
manner like rsync does, but safe both CPU (because csums are already
calculated for the file) and I/O (because I don't need to reread all the
file from the hard drive).

As of right now, there is no way to do this from userspace without just
directly parsing the on-disk format (which isn't safe or reliable if the
filesystem is mounted). It has been discussed before, but the discussions
haven't really gotten anywhere.

It's worth noting that the way btrfs does checksums isn't per-file, it's
per-block. This means that:
a. I think (I'm not 100% certain about this) that the checksum in btrfs
includes the padding up to the end of the block for blocks that aren't full.
b. Files that get stored in-line in their metadata block won't have a
checksum just for the file data (because the checksum will cover the whole
metadata block).
c. While it is possible with some checksum algorithms (if I remember right,
CRC32c is one such algorithm, and that is what btrfs uses for it's
checksums) to combine the checksums from a group of data blocks to get the
checksum for data as a whole, this in and of itself takes a significant
amount of CPU time for large amounts of data.

All in all, this means that if you just want a checksum of the contents of
the file, it's almost certainly better to just do it in userspace.
If you're trying to figure out what changed, using send/receive and
snapshots is more efficient (usually).

I want the checksums of the every block of the file to see which part
has been changed.
I cannot use send/receive because my other file replica is on the
remote host but not on the same filesystem. Compare with how rsync
works. It calculates checksums of the chunks of both versions of the
file and then syncs different chunks over the network. I just want to
utilize the fact that btrfs already has the data I need to calculate.
On current versions of btrfs-progs, btrfs send has a mode that will just spit out the metadata, which can then be parsed to figure out what has changed. The parsing is of course non-trivial, but should still be faster than checksumming everything, and I'm relatively sure (although I may be wrong) that the send stream format is well documented.


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux