On Mon, Jan 05, 2015 at 06:15:12PM +0100, Lennart Poettering wrote: > Heya, > > I recently added some btrfs magic to systemd's machinectl/nspawn > tool. More specifically it can now show the disk usage of a container > that is stored in a btrfs subvolume. For that I made use of the btrfs > quota logic. To read the current disk usage of a subvolume I took > inspiration from btrfs-progs, most specifically the > BTRFS_IOC_TREE_SEARCH ioctl(). Unfortunately, documentation for the > ioctl seems to to be lacking, but there are some things about it I > fail to grok: > > What precisely are the semantics of the ioctl, regarding the search > key min/max values (the fields of "struct btrfs_ioctl_search_key")? I > kinda assumed that setting them would result in in only objects to be > returned that are within the min/max ranges. However, that appears not > to be the case. At least the min_offset/max_offset setting appears to > be ignored? This is an old argument. :) Keys have three parts, so it's plausible (but, in this case, wrong) to consider the space you're searching to be a 3-dimensional space of (object, type, offset), which seems to be what you're expecting. A min, max pair would then define an oblong subset of the keyspace from which to retrieve keys. However, that's not actually what's happening. Keys are indexed within their tree(s) by a concatenation of the items in the key. A key, therefore, should be thought of as a single 136-bit integer, and the keys are lexically ordered, (object||type||offset), where "||" is the concatenation operator. You get every key _lexically ordered_ between the min and max values. This is a superset of the 3-dimensional results above. About 3-4 years ago, we see-sawed through several messy patches in userspace (and at least one in the kernel) before this distinction and difference in semantics was understood. > The code I hacked up is this one: > > http://cgit.freedesktop.org/systemd/systemd/tree/src/shared/btrfs-util.c#n427 > > I try to read the BTRFS_QGROUP_STATUS_KEY and BTRFS_QGROUP_LIMIT_KEY > objects for the subvolume I care about. Hence I initialize .min_type > and .max_type to the two types (in the right order), and then > .min_offset and .max_offset to subvolume id. However, the search ioctl > will still give me entries back with offsets != the subvolume id... > > Is this intended behaviour of the search ioctl? If so, what's the > rationale? Yes, it is. The rationale is that it's simply walking through the key values in the tree linearly until the max value is found. > My code currently invokes the search ioctl in a loop to work around > the fact that .min_offset/.max_offset don't work as I wish they > did... I wish I could get rid of this loop and filtering out of the > entries I get back that aren't in th range I specified... You'd have to do this in kernel space if you wanted the 3D semantics instead of the concatenated semantics. There's no free lunch here. It might be a good idea for "libbtrfs" (such as it is) to implement this, as it's a (moderately rare) repeat request. Hugo. -- Hugo Mills | Klytus, I'm bored. What plaything can you offer me hugo@... carfax.org.uk | today? http://carfax.org.uk/ | PGP: 65E74AC0 | Ming the Merciless, Flash Gordon
Attachment:
signature.asc
Description: Digital signature
