Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, May 28, 2017 at 6:56 AM, Duncan <1i5t5.duncan@xxxxxxx> wrote:
> [This mail was also posted to gmane.comp.file-systems.btrfs.]
>
> Ivan P posted on Sat, 27 May 2017 22:54:31 +0200 as excerpted:
>
>>>>>> Please add me to CC when replying, as I am not
>>>>>> subscribed to the mailing list.
>
>> Hmm, remounting as you suggested has shut it up immediately - hurray!
>>
>> I don't really have any special write pattern from what I can tell.
>> About the only thing different from all the other btrfs systems I've
>> set up is that the data is also on the same volume as the system.
>> Normal usage, no VMs or heavy file generation. I'm also only taking
>> snapshots of the system and @home, with the latter only containing
>> my .config, .cache and symlinks to some folders in @data.
>
> Systemd?  Journald with journals on btrfs?  Regularly snapshotting that
> subvolume?
>
> If yes to all of the above, that might be the issue.  Normally systemd
> will set the journal directory NOCOW, so the journal files inherit it
> at creation, in ordered to avoid heavy fragmentation due to the COW-
> unfriendly database-style file-internal-rewrite pattern with the
> journal files.
>
> Great.  Except that snapshotting locks the existing version of the file
> in place with the snapshot, so the next write to any block must be COW
> anyway.  This is sometimes referred to as COW1, since it's a
> single-time COW, and the effect isn't too bad with a one-time
> snapshot.  But if you're regularly snapshotting the journal files, that
> will trigger COW1 on every snapshot, which if you're snapshotting often
> enough can be almost as bad as regular COW in terms of fragmentation.
>
> The fix is to make the journal dir a subvolume instead, thereby
> excluding it from the snapshot taken on the parent subvolume, and just
> don't snapshot the journal subvolume then, so the NOCOW that systemd
> should already set on that subdir and its contents will actually be
> NOCOW, without interference from snapshotting repeatedly forcing COW1.
>
>
> Of course an alternative fix, the one I use here (and am happy with)
> instead, is to have a normal syslog (I use syslog-ng, but others have
> reported using rsyslog) handling your saved logs in traditional text
> form (most modern syslogs should cooperate with systemd's journald),
> and configure journald to only use tmpfs (see the journald.conf
> manpage). Traditional text logs are append-only and not nearly as bad
> in COW terms.  Meanwhile, journald is still active, just writing to
> tmpfs only, so you get a journal for the current boot session and thus
> can still take advantage of all the usual systemd/journald features
> such as systemctl status spitting out the last 10 log entries for that
> service, etc.  It's just limited to the current boot session, and you
> use the normal text logs for anything older than that.  For me anyway
> that's the best of both worlds, and I don't have to worry about how the
> journal files behave on btrfs at all, because they're not written to
> btrfs at all. =:^)
>
>
> Meanwhile, since you mentioned snapshots, a word of caution there.  If
> you do have scripted snapshots being taken, be sure you have a script
> thinning down your snapshot history as well.  More than 200-300
> snapshots per subvolume scales very poorly in btrfs maintenance terms
> (and qgroups make the problem far worse, if you have them active at
> all).  But if for instance you're taking snapshots ever hour, if you
> need something from one say a month old, are you really going to
> remember or care which exact hour it was, or will the daily either
> before or after that hour be fine, and actually much easier to find if
> you've trimmed to daily by then, as opposed to having hundreds and
> hundreds of hourly snapshots accumulating?
>
> So snapshots are great but they don't come without cost, and if you
> keep under 200 and if possible under 100 per subvolume, you'll find
> maintenance such as balance and check (fsck) go much faster than they
> do with even 500, let alone thousands.
>
> --
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman

I haven't had any issues like this before on another two boxes which ran
for years with systemd and journald, so I'm rather surprised this is a problem.

It does make sense for journald to fragment the disk, but isn't that what
autodefrag is for? The weird thing is that btrfs-cleaner can't seem to be
able to ever finish the work it is doing. Which would mean the work piles up
constantly, without getting done...

At the moment I am at 9 system snapshots and 5 @home snapshots, which
IMHO btrfs should be able to handle. The other boxes have about the same
number of snapshots and one of them is running 24/7 as a home server.

The snapshots are not automated, I take a snapshot of @arch_current and
@home using a script before updating my system. So the snapshot interval
is very irregular. I also try clean up old snapshots, only leaving about three
newest system snapshots on disk, though I haven't done that recently.

Oh, an I'm not using any qgroups, not that I know, at least.

Regards,
Ivan.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux