Re: [PATCH] recursive defrag cleanup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 3, 2017 at 5:01 PM, Austin S. Hemmelgarn
<ahferroin7@xxxxxxxxx> wrote:
> I agree on this point.  I actually hadn't known that it didn't recurse into
> sub-volumes, and that's a pretty significant caveat that should be
> documented (and ideally fixed, defrag doesn't need to worry about
> cross-subvolume stuff because it breaks reflinks anyway).

I think it descends into subvolumes to picks up the files (data)
inside them. I was referring to picking up the "child" subvolumes
(trees) and defrag those (as if you fed all the subvolumes to a
non-recursive defrag one-by-one with the current implementation --- if
I understand this current implementation correctly*).

To keep it simple: the recursive mode (IMO) should not ignore any
entities which the defrag tool is able to meaningfully operate on (no
matter if these are file data, directory metadata or subvolume tree
metadata, etc --- if it can be fragmented and can be defraged by this
tool, it should be done during a recursive mode operation with no
exceptions --- unless you can and do set explicit exceptions). I think
only the subvolume and/or the directory (*) metadata are currently
ignored by the recursive mode (if anything).

* But you got me a little bit confused again. After reading the first
few emails in this thread I thought only files (data) and subvolumes
(tree metadata) can be defraged by this tool and it's a no-op for
regular directories. Yet you seem to imply it's possible to defrag
regular directories (the directory metadata), meaning defrag can
operate on 3 type of entities in total (file data, subvolume tree
metadata, regular directory metadata).

> For single directories, -t almost certainly has near zero effect since
> directories are entirely in metadata.  For single files, it should only have
> an effect if it's smaller than the size of the file (it probably is for your
> usage if you've got hour long video files).  As far as the behavior above
> 128MB, stuff like that is expected to a certain extent when you have highly
> fragmented free space (the FS has to hunt harder to find a large enough free
> area to place the extent).
>
> FWIW, unless you have insanely slow storage, 32MB is a reasonable target
> fragment size.  Fragmentation is mostly an issue with sequential reads, and
> usually by the time you're through processing that 32MB of data, your
> storage device will have the next 32MB ready.  The optimal value of course
> depends on many things, but 32-64MB is reasonable for most users who aren't
> streaming multiple files simultaneously off of a slow hard drive.

Yes, I know and it's not a problem to use <=32Mb. I just wondered why
>=128Mb seems to be so incredibly slow for me.
Well, actually, I also wondered if the defrag tool can "create" big
enough continuous free space chunks by relocating other (probably
small[ish]) files (including non-fragmented files) in order to make
room for huge fragmented files to be re-assembled there as continuous
files. I just didn't make the connection between these two questions.
I mean, defrag will obviously fail with huge target extent sizes and
huge fragmented files if the free space is fragmented (and why
wouldn't it be somewhat fragmented...? deleting fragmented files will
result in fragmented free space and new files will be fragmented if
free space is fragmented, so you will delete fragmented files once
again, and it goes on forever -> that was my nightmare with ZFS... it
feels like the FS can only become more and more fragmented over time
unless you keep a lot of free space [let's say >50%] all the time and
it still remains somewhat random).

Although, this brings up complications. A really extensive defrag
would require some sort of smart planning: building a map of objects
(including free space and continuous files), divining the best
possible target and trying to achieve that shape by heavy
reorganization of (meta/)data.

> Really use case specific question, but have you tried putting each set of
> files (one for each stream) in it's own sub-volume?  Your metadata
> performance is probably degrading from the sheer number of extents involved
> (assuming H.264 encoding and full HD video with DVD quality audio, you're
> probably looking at at least 1000 extents for each file, probably more), and
> splitting into a subvolume per-stream should segregate the metadata for each
> set of files, which should in turn help avoid stuff like lock contention
> (and may actually make both balance and defrag run faster).

Before I had a dedicated disk+filesystem for these files I did think
about creating a subvolume for all these video recordings rather than
keeping them in a simple directory under a big multi-disk filesystem's
root/default subvolume. (The decision to separate these files was
forced by an external scale-ability problem --- limited number of
connectors/slots for disks and limited "working" RAID options in Btrfs
--- rather than an explicit desire for segregation -> although in the
light of these issues it might came on it's own at some point by now)
but I didn't really see the point. On the contrary, I would think
segregation by subvolumes could only complicate things further. It can
only increase the total complexity if it does anything. The total
amount of metadata will be roughly the same or more but not less. You
just add more complexity to the basket (making it bigger in some
sense) by introducing subvolumes.

But if it could "serve the common good", I could certainly try as a test-case.

The file size tends to be anywhere between 200 and 2000 Megabytes and
I observed some heavy fragmentation, like ~2k extents per ~2Gb files,
thus 1Mb/extent sizes on average. I guess it also depends on the total
write cache load (some database-like loads often result in write cache
flushing-frenzies but other times I allow up to ~1Gb to be cached in
memory before the disk has to write anything, so the extent size could
build up to >32Mb --- if the allocator is smart enough and free space
fragments are big enough...).

> You also have to factor
> in that directories tend to have more sticking power than file blocks in the
> VFS cache, since they're (usually) used more frequently, so once you've read
> the directory the first time, it's almost certainly going to be completely
> in cache.

I tired to tune that in the past (to favor metadata even more than the
default behavior) but I ended up with OOMs.

> To put it in perspective, a directory with about 20-25 entries and all
> file/directory names less than 15 characters (roughly typical root
> directory, not counting the . and .. pseudo-entries) easily fits entirely in
> one metadata block on BTRFS with a 16k block size (the current default),
> with lots of room to spare.

I use 4k nodesize. I am not sure why I picked that (probably in order
to try minimizing locking contention which I might thought I had a
problem with, years ago).

> then you're talking small enough improvements that you won't notice unless
> you're constantly listing the directory and trashing the page cache at the
> same time.

Well, actually, I do. I already filed a request on ffmpeg's bug
tracker to ask for Direct-IO support because video recording with
ffmpeg constantly flushes my page cache (and it's not the only job of
this little home server).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux