Re: Exploring referenced extents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2020-05-10 02:20, Qu Wenruo wrote:
On 2020/5/9 下午7:11, Steven Davies wrote:
For curiosity I'm trying to write a tool which will show me the size of
data extents belonging to which files in a snapshot are exclusive to
that snapshot as a way to show how much space would be freed if the
snapshot were to be deleted,

Isn't that what btrfs qgroup doing?

and which files in the snapshot are taking
up the most space.

That would be interesting as qgroup only works at subvolume level.


I'm working with Hans van Kranenburg's python-btrfs python library but
my knowledge of the filesystem structures isn't good enough to allow me to figure out which bits of data I need to be able to achieve this. I'd
be grateful if anyone could help me along with this.

You may want to look into the on-disk format first.

But spoiler alert, since qgroup has its performance impact (although
hugely reduced in recent releases), it's unavoidable.

So would be any similar methods.
In fact, in your particular case, you need more work than qgroup, thus
it would be slower than qgroup.
Considering how many extra ioctl and context switches needed, I won't be
surprised if it's way slower than qgroup.


So far my idea is:

for each OS file in a subvolume:

This can be done by ftw(), and don't cross subvolume boundary.

  find its data extents

Fiemap.

  for each extent:
    find what files reference it #1

Btrfs tree search ioctl, to search extent tree, and do backref walk just
like what we did in qgroup code.

    for each referencing file:
      determine which subvolumes it lives in #2

Unlike kernel, you also need to do this using btrfs tree search ioctl.

    if all references are within this subvolume:
      record the OS file path and extents it references

for each recorded file path
  find its data extents
  output its path and the total number of bytes in all recorded extents
(those which are not shared)

#1 and #2 are where my understanding breaks down. How do I find which
files reference an extent and which subvolume those files are in?

In short, you need the following skills (which would make you a btrfs
developer already):
- Basic btrfs tree search
  Things like how btrfs btree works, and how to iterate them.

- Basic user space file system interface understanding
  Know tools like fiemap().

- Btrfs extent tree understanding
  Know how to interpret inline/keyed data/metadata indirect/direct
  backref item.
  This is the key and the most complex thing.
  IIRC I have added some comments about this in recent backref.c code.

Yes, I'm now stuck with a btrfs_extent_inline_ref of type BTRFS_SHARED_DATA_REF_KEY which I understand is a direct backref to a metadata block[1], but I don't understand how to search for that block itself. I got lucky with the rest of the code and have found all EXTENT_ITEM_KEYs for a file. The python library makes looking through the EXTENT_DATA_REF_KEYs easy but not the shared data refs.

- Btrfs subvolume tree understanding
  Know how btrfs organize files/dirs in its subvolume trees.
  This is the key to locate which (subvolume, ino) owns a file extent.
There are some pitfalls, like the backref item to file extent mapping.
  But should be easier than extent tree.

[1] https://btrfs.wiki.kernel.org/index.php/Data_Structures#btrfs_extent_inline_ref

Thanks,
--
Steven Davies



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux