Re: Exploring referenced extents

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2020/5/10 下午6:55, Steven Davies wrote:
> On 2020-05-10 02:20, Qu Wenruo wrote:
>> On 2020/5/9 下午7:11, Steven Davies wrote:
>>> For curiosity I'm trying to write a tool which will show me the size of
>>> data extents belonging to which files in a snapshot are exclusive to
>>> that snapshot as a way to show how much space would be freed if the
>>> snapshot were to be deleted,
>>
>> Isn't that what btrfs qgroup doing?
>>
>>> and which files in the snapshot are taking
>>> up the most space.
>>
>> That would be interesting as qgroup only works at subvolume level.
>>
>>>
>>> I'm working with Hans van Kranenburg's python-btrfs python library but
>>> my knowledge of the filesystem structures isn't good enough to allow me
>>> to figure out which bits of data I need to be able to achieve this. I'd
>>> be grateful if anyone could help me along with this.
>>
>> You may want to look into the on-disk format first.
>>
>> But spoiler alert, since qgroup has its performance impact (although
>> hugely reduced in recent releases), it's unavoidable.
>>
>> So would be any similar methods.
>> In fact, in your particular case, you need more work than qgroup, thus
>> it would be slower than qgroup.
>> Considering how many extra ioctl and context switches needed, I won't be
>> surprised if it's way slower than qgroup.
>>
>>>
>>> So far my idea is:
>>>
>>> for each OS file in a subvolume:
>>
>> This can be done by ftw(), and don't cross subvolume boundary.
>>
>>>   find its data extents
>>
>> Fiemap.
>>
>>>   for each extent:
>>>     find what files reference it #1
>>
>> Btrfs tree search ioctl, to search extent tree, and do backref walk just
>> like what we did in qgroup code.
>>
>>>     for each referencing file:
>>>       determine which subvolumes it lives in #2
>>
>> Unlike kernel, you also need to do this using btrfs tree search ioctl.
>>
>>>     if all references are within this subvolume:
>>>       record the OS file path and extents it references
>>>
>>> for each recorded file path
>>>   find its data extents
>>>   output its path and the total number of bytes in all recorded extents
>>> (those which are not shared)
>>>
>>> #1 and #2 are where my understanding breaks down. How do I find which
>>> files reference an extent and which subvolume those files are in?
>>
>> In short, you need the following skills (which would make you a btrfs
>> developer already):
>> - Basic btrfs tree search
>>   Things like how btrfs btree works, and how to iterate them.
>>
>> - Basic user space file system interface understanding
>>   Know tools like fiemap().
>>
>> - Btrfs extent tree understanding
>>   Know how to interpret inline/keyed data/metadata indirect/direct
>>   backref item.
>>   This is the key and the most complex thing.
>>   IIRC I have added some comments about this in recent backref.c code.
> 
> Yes, I'm now stuck with a btrfs_extent_inline_ref of type
> BTRFS_SHARED_DATA_REF_KEY which I understand is a direct backref to a
> metadata block[1],

Yep, SHARED_DATA_REF is the type for direct (shows the direct parent)
for data.
But there is also an indirect (just tell you how to search) one,
EXTENT_DATA_REF, and under most case, EXTENT_DATA_REF is more common.

> but I don't understand how to search for that block
> itself. I got lucky with the rest of the code and have found all
> EXTENT_ITEM_KEYs for a file. The python library makes looking through
> the EXTENT_DATA_REF_KEYs easy but not the shared data refs.

For EXTENT_DATA_REF, it contains rootid, objectid (inode number), offset
(not file offset, but a calculated one), and count.
That's pretty simple, since it contains the rootid and inode number.

For SHARED_DATA_REF, you need to search the parent bytenr in extent tree.
It can be SHARED_BLOCK_REF (direct meta ref) or TREE_BLOCK_REF (indirect
meta ref).

For TREE_BLOCK_REF, although it contains the owner, you can't stop here,
but still do a search to build a full path towards that root node.
Then check each node to make sure if the node is also shared by other trees.

For SHARED_BLOCK_REF, you need to go to its parent again, until you
build the full path to the root node.

Now you can see why the backref code used in balance and qgroup is complex.

Thanks,
Qu

> 
>> - Btrfs subvolume tree understanding
>>   Know how btrfs organize files/dirs in its subvolume trees.
>>   This is the key to locate which (subvolume, ino) owns a file extent.
>>   There are some pitfalls, like the backref item to file extent mapping.
>>   But should be easier than extent tree.
> 
> [1]
> https://btrfs.wiki.kernel.org/index.php/Data_Structures#btrfs_extent_inline_ref
> 
> 
> Thanks,

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux