On 2020/5/10 下午8:51, Steven Davies wrote: > On 2020-05-10 12:55, Qu Wenruo wrote: >> On 2020/5/10 下午6:55, Steven Davies wrote: >>> On 2020-05-10 02:20, Qu Wenruo wrote: > >>> Yes, I'm now stuck with a btrfs_extent_inline_ref of type >>> BTRFS_SHARED_DATA_REF_KEY which I understand is a direct backref to a >>> metadata block[1], >> >> Yep, SHARED_DATA_REF is the type for direct (shows the direct parent) >> for data. >> But there is also an indirect (just tell you how to search) one, >> EXTENT_DATA_REF, and under most case, EXTENT_DATA_REF is more common. >> >>> but I don't understand how to search for that block >>> itself. I got lucky with the rest of the code and have found all >>> EXTENT_ITEM_KEYs for a file. The python library makes looking through >>> the EXTENT_DATA_REF_KEYs easy but not the shared data refs. >> >> For EXTENT_DATA_REF, it contains rootid, objectid (inode number), offset >> (not file offset, but a calculated one), and count. >> That's pretty simple, since it contains the rootid and inode number. >> >> For SHARED_DATA_REF, you need to search the parent bytenr in extent tree. >> It can be SHARED_BLOCK_REF (direct meta ref) or TREE_BLOCK_REF (indirect >> meta ref). >> >> For TREE_BLOCK_REF, although it contains the owner, you can't stop here, >> but still do a search to build a full path towards that root node. >> Then check each node to make sure if the node is also shared by other >> trees. >> >> For SHARED_BLOCK_REF, you need to go to its parent again, until you >> build the full path to the root node. >> >> Now you can see why the backref code used in balance and qgroup is >> complex. > > I can, I get the feeling that this is now way beyond my abilities and I > can see why it will be very slow to run in practice - especially through > the Python abstraction. Perhaps if knorrie adds backref walking helpers > to python-btrfs it might become more feasible. > Another problem here is, the btrfs tree search operation is all done on commit tree, which is the on-disk data (not the running transaction). Further more, since we need to search extent tree several times, and unlike kernel space, we can using a transaction handler to block current transaction being committed (which switch commit root with current root). In user space, we don't have such ability to control transaction commitment, which means we can easily get transaction being committed during our long search, resulting bad result. It's already hard to do it in kernel space, it won't be any simpler for user space. Thanks, Qu
Attachment:
signature.asc
Description: OpenPGP digital signature
