On 2020/5/9 下午7:11, Steven Davies wrote: > For curiosity I'm trying to write a tool which will show me the size of > data extents belonging to which files in a snapshot are exclusive to > that snapshot as a way to show how much space would be freed if the > snapshot were to be deleted, Isn't that what btrfs qgroup doing? > and which files in the snapshot are taking > up the most space. That would be interesting as qgroup only works at subvolume level. > > I'm working with Hans van Kranenburg's python-btrfs python library but > my knowledge of the filesystem structures isn't good enough to allow me > to figure out which bits of data I need to be able to achieve this. I'd > be grateful if anyone could help me along with this. You may want to look into the on-disk format first. But spoiler alert, since qgroup has its performance impact (although hugely reduced in recent releases), it's unavoidable. So would be any similar methods. In fact, in your particular case, you need more work than qgroup, thus it would be slower than qgroup. Considering how many extra ioctl and context switches needed, I won't be surprised if it's way slower than qgroup. > > So far my idea is: > > for each OS file in a subvolume: This can be done by ftw(), and don't cross subvolume boundary. > find its data extents Fiemap. > for each extent: > find what files reference it #1 Btrfs tree search ioctl, to search extent tree, and do backref walk just like what we did in qgroup code. > for each referencing file: > determine which subvolumes it lives in #2 Unlike kernel, you also need to do this using btrfs tree search ioctl. > if all references are within this subvolume: > record the OS file path and extents it references > > for each recorded file path > find its data extents > output its path and the total number of bytes in all recorded extents > (those which are not shared) > > #1 and #2 are where my understanding breaks down. How do I find which > files reference an extent and which subvolume those files are in? In short, you need the following skills (which would make you a btrfs developer already): - Basic btrfs tree search Things like how btrfs btree works, and how to iterate them. - Basic user space file system interface understanding Know tools like fiemap(). - Btrfs extent tree understanding Know how to interpret inline/keyed data/metadata indirect/direct backref item. This is the key and the most complex thing. IIRC I have added some comments about this in recent backref.c code. - Btrfs subvolume tree understanding Know how btrfs organize files/dirs in its subvolume trees. This is the key to locate which (subvolume, ino) owns a file extent. There are some pitfalls, like the backref item to file extent mapping. But should be easier than extent tree. Thanks, Qu > > Alternatively, if such a script already exists I would be happy to use it. > > Thanks for any pointers.
Attachment:
signature.asc
Description: OpenPGP digital signature
