On 09/05/2020 12:11, Steven Davies wrote: > For curiosity I'm trying to write a tool which will show me the size of > data extents belonging to which files in a snapshot are exclusive to > that snapshot as a way to show how much space would be freed if the > snapshot were to be deleted, and which files in the snapshot are taking > up the most space. I have some scripts to do that. They are slow but seem to pretty much work. See https://github.com/GrahamCobb/extents-lists > I'm working with Hans van Kranenburg's python-btrfs python library but > my knowledge of the filesystem structures isn't good enough to allow me > to figure out which bits of data I need to be able to achieve this. I'd > be grateful if anyone could help me along with this. Rewriting them to use Hans' library is one of the things I plan to do one day! > So far my idea is: > > for each OS file in a subvolume: > find its data extents > for each extent: > find what files reference it #1 > for each referencing file: > determine which subvolumes it lives in #2 > if all references are within this subvolume: > record the OS file path and extents it references > > for each recorded file path > find its data extents > output its path and the total number of bytes in all recorded extents > (those which are not shared) My approach is different. I don't attempt to understand which files share extents, or which subvolumes they are in. Instead, I just try to analyse which extents are in use by a subvolume (or, actually, any set of files you specify). This is easy (but slow) to do. And makes answering some questions easy. However, it makes answering questions like "how many extents would really be freed if I deleted this subvolume" hard (the scripts end up working out the complete list of extents in use on the filesystem, and, separately, the list of which would be in use if the subvolume was removed - the difference is the space freed up by deleting the subvolume). This often takes a day or two. I would be interested if you find a more efficient approach to working out how much space will be freed up if a set of files (such as particular subvolumes) are removed, allowing for snapshots, reflink copies and dedup. Graham
