Re: defragmenting best practice?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-09-12 12:28, Ulli Horlacher wrote:
On Thu 2017-08-31 (09:05), Ulli Horlacher wrote:
When I do a
btrfs filesystem defragment -r /directory
does it defragment really all files in this directory tree, even if it
contains subvolumes?
The man page does not mention subvolumes on this topic.

No answer so far :-(
I hadn't seen your original mail, otherwise I probably would have responded. Sorry about that.

On the note of the original question:
I'm pretty sure that it does recursively operate on nested subvolumes. The documentation doesn't say otherwise, and not doing so would be non-intuitive to people who don't know anything about subvolumes.

But I found another problem in the man-page:

   Defragmenting with Linux kernel versions < 3.9 or >= 3.14-rc2 as well as
   with Linux stable kernel versions >= 3.10.31, >= 3.12.12 or >= 3.13.4
   will break up the ref-links of COW data (for example files copied with
   cp --reflink, snapshots or de-duplicated data). This may cause
   considerable increase of space usage depending on the broken up
   ref-links.

I am running Ubuntu 16.04 with Linux kernel 4.10 and I have several
snapshots.
Therefore, I better should avoid calling "btrfs filesystem defragment -r"?

What is the defragmenting best practice?
That really depends on what you're doing.

First, you need to understand that defrag won't break _all_ reflinks, just the particular instances you point it at. So, if you have subvolume A, and snapshots S1 and S2 of that subvolume A, then running defrag on _just_ subvolume A will break the reflinks between it and the snapshots, but S1 and S2 will still share any data they were originally with each other. If you then take a third snapshot of A, it will share data with A, but not with S1 or S2 (because A is no longer sharing data with S1 or S2).

Given this behavior, you have in turn three potential cases when talking about persistent snapshots:

1. You care about minimizing space used, but aren't as worried about performance. In this case, the only option is to not run defrag at all. 2. You care about performance, but not space usage. In this case, defragment everything. 3. You care about both space usage and performance. In this case, I would personally suggest defragmenting only the source subvolume (so only subvolume A in the above explanation), and doing so on a schedule that coincides with snapshot rotation. The idea is to defrag just before you take a snapshot, and at a frequency that gives a good balance between space usage and performance. As a general rule, if you take this route, start by doing the defrag on either a monthly basis if you're doing daily or weekly snapshots, or with every fourth snapshot if not, and then adjust the interval based on how that impacts your space usage.

Additionally, you can compact free space without defragmenting data or breaking reflinks by running a full balance on the filesystem.

The tricky part though is that differing workloads are impacted differently by fragmentation. Using just four generic examples:

* Mostly sequential write focused workloads (like security recording systems) tend to be impacted by free space fragmentation more than data fragmentation. Balancing filesystems used for such workloads is likely to give a noticeable improvement, but defragmenting probably won't give much. * Mostly sequential read focused workloads (like a streaming media server) tend to be the most impacted by data fragmentation, but aren't generally impacted by free space fragmentation. As a result, defrag will help here a lot, but balance won't as much. * Mostly random write focused workloads (like most database systems or virtual machines) are often impacted by both free space and data fragmentation, and are a pathological case for CoW filesystems. Balance and defrag will help here, but they won't help for long. * Mostly random read focused workloads (like most non-multimedia desktop usage) are not impacted much by either aspect, but if you're on a traditional hard drive they can be impacted significantly by how the data is spread across the disk. Balance can help here, but only because it improves data locality, not because it compacts free space.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux