On Thu, 7 Nov 2019, Remi Gauvin wrote:
On 2019-11-07 9:03 a.m., Nate Eldredge wrote:
1. What causes this? I saw some references to "unused extents" but it
wasn't clear how that happens, or why they wouldn't be freed through
normal operation. Are there certain usage patterns that exacerbate it?
Virtual Box Image files are subject to many, many small writes... (just
booting windows, for example, can create well over 5000 file fragments.)
When the image file is new, the extents will be very large. In BTRFS,
the extents are immutable. When a small write creates a new 4K COW
extent, the old 4k remains as part of the old extent as well. This
situation will remain until all the data in the old extent is
re-written.. when none of that data is referenced anymore, the extent
will be freed.
Thanks, Remi. This is very helpful in understanding what is going on. In
particular, I didn't realize that extents are immutable even when there is
only one reference to them (I have no snapshots or reflinks to these
files).
I guess this also means that in the worst case, if I want to overwrite the
entire file "in place" in a random order, I actually need additional free
space equal to the file's size, until I get around to defragging. That's
rather counterintuitive for somebody used to traditional filesystems.
5. Is there a better way to detect this kind of wastage, to distinguish
it from more mundane causes (deleted files still open, etc) and see how
much space could be recovered? In particular, is there a way to tell
which files are most affected, so that I can just defragment those?
Generally speaking, files that are subject to many random writes are
few, and you should be well aware of the larger ones where this might be
an issues,, (virtual image files, large databases, etc.) These files
should be defragmented frequently. I don't see any reason not run
defrag over the whole subvolume, but if you want to search for files
with absurd fragments, you can always use the find command to search for
files, run the filefrag command on them, then use whatever tools you
like to search the output for files with thousands of fragments.
Okay. Defragmenting is kind of inconvenient, though, and I suppose it
involves some extra wear on the SSD since data is really being moved.
There's also the issue, as I understand it, that defragmenting will break
up existing reflinks, which in some other situations I may really want to
keep.
In fact, it seems that somehow what I really want is for the file to be
*completely* fragmented, so that every write replaces an extent and frees
the old one. On an SSD I don't really care if the data blocks are
actually contiguous. It seems perverse, but even if there is more
overhead, it might be worth it when I don't have a lot of free space to
spare. I don't suppose there is any way to arrange that?
Thanks again!
--
Nate Eldredge
nate@xxxxxxxxxxxxxxxxxxxx