On 2018-01-10 11:30, Tom Worster wrote:
On 9 Jan 2018, at 22:49, Duncan wrote:
AFAIK, such corruption reports re balance aren't really balance, per se,
at all.
Instead, what I've seen in nearly all cases is a number of filesystem
maintenance commands involving heavy I/O colliding, that is, being run at
the same time
I hope there is consensus on this because it might be the key to
resolving the contradictions that appear to me in the following
propositions that all seem plausible/reasonable:
- Depletion of unallocated space (DoUS, apologies for coining the term
if there already is one) is a property of BTRFS even if the volume's
capacity is more than enough for the files on it.
Strictly speaking this particular statement is only true in that there
are still probably bugs in the allocator. The goal is for this to never
be a significant problem as long as you have a reasonable amount of free
space (reasonable being enough for at least a couple of chunks to be
allocated).
Also, for future reference, the term we typically use is ENOSPC, as
that's the symbolic name for the error code you get when this happens
(or when your filesystem is just normally full), but I actually kind of
like your name for it too, it conveys the exact condition being
discussed in a way that should be a bit easier for non-technical types
to understand.
- To a user that isn't a BTRFS expert, DoUS can be unexpected, its
advance can be surprisingly fast and it can become severe.
Absolutely correct, and actually true even for a number of BTRFS
'experts' (no, seriously, I know of a number of cases where this caught
'experts' (including myself) by surprise simply because they ran into a
corner case they had never dealt with or found a bug in the allocator).
- BTRFS does not recycle allocated but unused space to the unallocated
pool.
Kind of.
The regular BTRFS allocator will (usually) preferentially avoid using
blocks of free space smaller than a given size for new allocations.
Without the 'ssd' mount option set, or when using Linux kernel version
4.14 or newer, the minimum size is 64kB, so it's generally not too bad
unless you regularly are dealing with lots of small files that change
very frequently. With the 'ssd' mount option set on Linux kernels prior
to 4.14, the minimum size is 2MB, which tends to result in really poor
space utilization, though it's still mostly an issue with volumes
holding lots of small files that change frequently or see lots of small
changes to large files.
However, this does not mean that that space will always be unused. If
space gets tight, BTRFS will use that previously allocated space to it's
fullest, and it will reuse it in other circumstances too.
- Resolving severe DoUS involves either running `btrfs balance` or
recreating the filesystem from, e.g. backups.
In most cases yes, though it is sometimes possible to resolve simply by
dropping snapshots if you have a lot of them and then deleting some files.
- People have reported that `btrfs balance` sometimes causes filesystem
corruption.
As I commented, I've not heard about this specifically, and I'm inclined
to agree with Duncan's assessment that it's probably from people running
multiple low-level maintenance operations happening concurrently
(running two or more balances at the same time is known to be able to
cause this type of corruption, and as a result there's locking in the
kernel to prevent you from running more than one balance at a time on a
filesystem).>
- Some experienced users say that, to resolve a problem with DoUS, they
would rather recreate the filesystem than run balance.
This is kind of independent of BTRFS. A lot of seasoned system
administrators are going to be more likely to just rebuild a broken
filesystem from scratch if possible than repair it simply because it's
more reliable and generally guaranteed to fix the issue. It largely
comes down to the mentality of the individual, and how confident they
are that they can fix a problem in a reasonable amount of time without
causing damage elsewhere.
- Some experienced users say you should stop all other use of the
filesystem while running balance.
I've never seen any evidence that this is actually needed, but it does
make the balance operation finish faster. Strictly speaking, it
shouldn't be needed at all (that's part of the point of having CoW
semantics in the filesystem, it makes it easier to handle maintenance
on-line).
- Some experts recommend running balance regularly, even once a day, to
prevent DoUS. >
Without some satisfactory way to resolve the contradictions, I'm not
sure how to proceed. For example, I'm not willing to offload the
workload from each filesystem once a day for prophylactic balance. And
I'm not going to let balance run unattended if those more experienced
than me say it's known to corrupt filesystems. The best I can do is
monitor DoUS and respond ad hoc. Or I can use a different fs type.
It may be worth seriously looking at whether you actually _need_ BTRFS
for your use case. In general, unless you need at least one of it's
features, and either can't get that feature with ZFS or just want to
avoid using ZFS, you are likely better-off for the time being using
another filesystem.
In my case for example, I _really_ want to avoid dealing with ZFS on
Linux because of how it impacts what kernel versions I use and the fact
that I don't trust the proprietary NVIDIA drivers to get along with it,
and I need the checksumming and online transformation features
(reshaping, profile conversion, device replacement, etc) of BTRFS. If
it weren't for all of that, I would not be using BTRFS at all.
But if Duncan is right (which, for me, is practically the same as
consensus on the proposition) that problems with corruption while
running balance are associated with heavy coincident IO activity, then I
can see a reasonable way forwards. I can even see how general
recommendations for BTRFS maintenance might develop.
As I commented above, I would tend to believe Duncan is right in this
case (both because it makes sense, and because he seems to generally be
right about this type of thing). That said, I really do think that
normal user I/O is probably not the issue, but low-level filesystem
operations are. That said, there is no reason that BTRFS shouldn't either:
1. Handle this just fine without causing corruption.
or:
2. Extend the mutex used to prevent concurrent balances to cover other
operations that might cause issues (that is, make it so you can't scrub
a filesystem while it's being balanced, or defragment it, or whatever else).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html