On 2018-06-19 12:30, james harvey wrote:
On Tue, Jun 19, 2018 at 11:47 AM, Marc MERLIN <marc@xxxxxxxxxxx> wrote:
On Mon, Jun 18, 2018 at 06:00:55AM -0700, Marc MERLIN wrote:
So, I ran this:
gargamel:/mnt/btrfs_pool2# btrfs balance start -dusage=60 -v . &
[1] 24450
Dumping filters: flags 0x1, state 0x0, force is off
DATA (flags 0x2): balancing, usage=60
gargamel:/mnt/btrfs_pool2# while :; do btrfs balance status .; sleep 60; done
0 out of about 0 chunks balanced (0 considered), -nan% left
This (0/0/0, -nan%) seems alarming. I had this output once when the
system spontaneously rebooted during a balance. I didn't have any bad
effects afterward.
Balance on '.' is running
0 out of about 73 chunks balanced (2 considered), 100% left
Balance on '.' is running
After about 20mn, it changed to this:
1 out of about 73 chunks balanced (6724 considered), 99% left
This seems alarming. I wouldn't think # considered should ever exceed
# chunks. Although, it does say "about", so maybe it can a little
bit, but I wouldn't expect it to exceed it by this much.
Actually, output like this is not unusual. In the above line, the 1 is
how many chunks have been actually processed, the 73 is how many the
command expects to process (that is, the count of chunks that fit the
filtering requirements, in this case, ones which are 60% or less full),
and the 6724 is how many it has checked against the filtering
requirements. So, if you've got a very large number of chunks, and are
selecting a small number with filters, then the considered value is
likely to be significantly higher than the first two.
Balance on '.' is running
Now, 12H later, it's still there, only 1 out of 73.
gargamel:/mnt/btrfs_pool2# btrfs fi show .
Label: 'dshelf2' uuid: 0f1a0c9f-4e54-4fa7-8736-fd50818ff73d
Total devices 1 FS bytes used 12.72TiB
devid 1 size 14.55TiB used 13.81TiB path /dev/mapper/dshelf2
gargamel:/mnt/btrfs_pool2# btrfs fi df .
Data, single: total=13.57TiB, used=12.60TiB
System, DUP: total=32.00MiB, used=1.55MiB
Metadata, DUP: total=121.50GiB, used=116.53GiB
GlobalReserve, single: total=512.00MiB, used=848.00KiB
kernel: 4.16.8
Is that expected? Should I be ready to wait days possibly for this
balance to finish?
It's now beeen 2 days, and it's still stuck at 1%
1 out of about 73 chunks balanced (6724 considered), 99% left
First, my disclaimer. I'm not a btrfs developer, and although I've
ran balance many times, I haven't really studied its output beyond the
% left. I don't know why it says "about", and I don't know if it
should ever be that far off.
In your situation, I would run "btrfs pause <path>", wait to hear from
a btrfs developer, and not use the volume whatsoever in the meantime.
I would say this is probably good advice. I don't really know what's
going on here myself actually, though it looks like the balance got
stuck (the output hasn't changed for over 36 hours, unless you've got an
insanely slow storage array, that's extremely unusual (it should only be
moving at most 3GB of data per chunk)).
That said, I would question the value of repacking chunks that are
already more than half full. Anything above a 50% usage filter
generally takes a long time, and has limited value in most cases (higher
values are less likely to reduce the total number of allocated chunks).
With `-duszge=50` or less, you're guaranteed to reduce the number of
chunk if at least two match, and it isn't very time consuming for the
allocator, all because you can pack at least two matching chunks into
one 'new' chunk (new in quotes because it may re-pack them into existing
slack space on the FS). Additionally, `-dusage=50` is usually
sufficient to mitigate the typical ENOSPC issues that regular balancing
is supposed to help with.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html