On 2020-05-11 02:21, Hans van Kranenburg wrote:
Hi!
Thanks for your insights!
On 5/9/20 1:11 PM, Steven Davies wrote:
For curiosity I'm trying to write a tool which will show me the size
of
data extents belonging to which files in a snapshot are exclusive to
that snapshot as a way to show how much space would be freed if the
snapshot were to be deleted, and which files in the snapshot are
taking
up the most space.
<snip lots of useful information>
This is what I was missing when I read the documentation:
find what files reference it #1
for each referencing file:
determine which subvolumes it lives in #2
For this, we delegate the work to the running linux kernel code, to ask
it who's using the extent at this disk_bytenr.
https://python-btrfs.readthedocs.io/en/stable/btrfs.html#btrfs.ioctl.logical_to_ino_v2
The main thing you're looking for is the ignore_offset option, which
will give you a list of *any* user of *any* data in that extent,
instead
of only the first 4096 bytes in it which disk_bytenr itself is part of.
I did rework the script - albeit not the way you suggested (I still walk
the file tree and look up the extents) because my subvolumes are small
and stored on relatively fast SSDs, and this way allows me to narrow the
search to a single directory - but it seems to work now. It isn't pretty
yet either! It's succeeded in telling me that the reason the oldest
snapshot of my / subvolume is huge is because it contains a dump of
linux-firmware that's not shared by anything.
Next job - make it into a tree-like utility.
https://github.com/daviessm/btrfs-snapshots-diff/blob/4003a3fdec70c2a0de348e75a6576f9342754f54/btrfs-subvol-size.py
--
Steven Davies