On 2017-09-12 12:28, Ulli Horlacher wrote:
On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
When I do a
btrfs filesystem defragment -r /directory
does it defragment really all files in this directory tree, even if it
contains subvolumes?
The man page does not mention subvolumes on this topic.
No answer so far :-(
I hadn't seen your original mail, otherwise I probably would have
responded. Sorry about that.
On the note of the original question:
I'm pretty sure that it does recursively operate on nested subvolumes.
The documentation doesn't say otherwise, and not doing so would be
non-intuitive to people who don't know anything about subvolumes.
But I found another problem in the man-page:
Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
will break up the ref-links of COW data (for example files copied with
cp --reflink, snapshots or de-duplicated data). This may cause
considerable increase of space usage depending on the broken up
ref-links.
I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
snapshots.
Therefore, I better should avoid calling "btrfs filesystem defragment -r"?
What is the defragmenting best practice?
That really depends on what you're doing.
First, you need to understand that defrag won't break _all_ reflinks,
just the particular instances you point it at. So, if you have
subvolume A, and snapshots S1 and S2 of that subvolume A, then running
defrag on _just_ subvolume A will break the reflinks between it and the
snapshots, but S1 and S2 will still share any data they were originally
with each other. If you then take a third snapshot of A, it will share
data with A, but not with S1 or S2 (because A is no longer sharing data
with S1 or S2).
Given this behavior, you have in turn three potential cases when talking
about persistent snapshots:
1. You care about minimizing space used, but aren't as worried about
performance. In this case, the only option is to not run defrag at all.
2. You care about performance, but not space usage. In this case,
defragment everything.
3. You care about both space usage and performance. In this case, I
would personally suggest defragmenting only the source subvolume (so
only subvolume A in the above explanation), and doing so on a schedule
that coincides with snapshot rotation. The idea is to defrag just
before you take a snapshot, and at a frequency that gives a good balance
between space usage and performance. As a general rule, if you take
this route, start by doing the defrag on either a monthly basis if
you're doing daily or weekly snapshots, or with every fourth snapshot if
not, and then adjust the interval based on how that impacts your space
usage.
Additionally, you can compact free space without defragmenting data or
breaking reflinks by running a full balance on the filesystem.
The tricky part though is that differing workloads are impacted
differently by fragmentation. Using just four generic examples:
* Mostly sequential write focused workloads (like security recording
systems) tend to be impacted by free space fragmentation more than data
fragmentation. Balancing filesystems used for such workloads is likely
to give a noticeable improvement, but defragmenting probably won't give
much.
* Mostly sequential read focused workloads (like a streaming media
server) tend to be the most impacted by data fragmentation, but aren't
generally impacted by free space fragmentation. As a result, defrag
will help here a lot, but balance won't as much.
* Mostly random write focused workloads (like most database systems or
virtual machines) are often impacted by both free space and data
fragmentation, and are a pathological case for CoW filesystems. Balance
and defrag will help here, but they won't help for long.
* Mostly random read focused workloads (like most non-multimedia desktop
usage) are not impacted much by either aspect, but if you're on a
traditional hard drive they can be impacted significantly by how the
data is spread across the disk. Balance can help here, but only because
it improves data locality, not because it compacts free space.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html