Re: Defragmenting to recover wasted space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, 7 Nov 2019, Remi Gauvin wrote:

On 2019-11-07 9:03 a.m., Nate Eldredge wrote:

1. What causes this?  I saw some references to "unused extents" but it
wasn't clear how that happens, or why they wouldn't be freed through
normal operation.  Are there certain usage patterns that exacerbate it?

Virtual Box Image files are subject to many, many small writes... (just
booting windows, for example, can create well over 5000 file fragments.)
When the image file is new, the extents will be very large.  In BTRFS,
the extents are immutable. When a small write creates a new 4K COW
extent, the old 4k remains as part of the old extent as well.  This
situation will remain until all the data in the old extent is
re-written.. when none of that data is referenced anymore, the extent
will be freed.

Thanks, Remi. This is very helpful in understanding what is going on. In particular, I didn't realize that extents are immutable even when there is only one reference to them (I have no snapshots or reflinks to these files).

I guess this also means that in the worst case, if I want to overwrite the entire file "in place" in a random order, I actually need additional free space equal to the file's size, until I get around to defragging. That's rather counterintuitive for somebody used to traditional filesystems.

5. Is there a better way to detect this kind of wastage, to distinguish
it from more mundane causes (deleted files still open, etc) and see how
much space could be recovered? In particular, is there a way to tell
which files are most affected, so that I can just defragment those?

Generally speaking, files that are subject to many random writes are
few, and you should be well aware of the larger ones where this might be
an issues,, (virtual image files, large databases, etc.)  These files
should be defragmented frequently.  I don't see any reason not run
defrag over the whole subvolume, but if you want to search for files
with absurd fragments, you can always use the find command to search for
files, run the filefrag command on them, then use whatever tools you
like to search the output for files with thousands of fragments.

Okay. Defragmenting is kind of inconvenient, though, and I suppose it involves some extra wear on the SSD since data is really being moved. There's also the issue, as I understand it, that defragmenting will break up existing reflinks, which in some other situations I may really want to keep.

In fact, it seems that somehow what I really want is for the file to be *completely* fragmented, so that every write replaces an extent and frees the old one. On an SSD I don't really care if the data blocks are actually contiguous. It seems perverse, but even if there is more overhead, it might be worth it when I don't have a lot of free space to spare. I don't suppose there is any way to arrange that?

Thanks again!

--
Nate Eldredge
nate@xxxxxxxxxxxxxxxxxxxx

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux