Re: btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2/21/20 6:07 PM, Marc MERLIN wrote:
Ok, first I'll update the subject line

On Fri, Feb 21, 2020 at 10:45:45AM +0500, Roman Mamedov wrote:
On Thu, 20 Feb 2020 21:38:04 -0800
Marc MERLIN <marc@xxxxxxxxxxx> wrote:

I had a closer look, and even with 5.4.20, my whole lv is full now:
   LV Name                thinpool2
   Allocated pool data    99.99%
   Allocated metadata     59.88%

Oversubscribing thin storage should be done carefully and only with a very
good reason, and when you run out of something you didn't have in the first
place, seems hard to blame Btrfs or anyone else for it.

let's rewind.
It's a backup server, I used to have everything in a single 14TB
filesystem, I had too many snapshots, and was told to break it up in
smaller filesystems to work around btrfs' inability to scale properly
past a hundred snapshots or so (and that many snapshots blowing up both
kinds of btrfs check --repair, one of them forced me to buy 16GB of RAM
to max out my server until it still ran out of RAM and now I can't add
any).

I'm obviously not going to the olden days of making actual partitions
and guessing wrong every time how big each partition should be, so my
only solution left was to use dm-thin and subscribe the entire space to
all LVs.
I then have a cronjob that warns me if I start running low on in the
global VG pool.

Now, where it got confusing is that around the time I put the 5.4 with
the df problem, is the same time df filled up to 100% and started
mailing me. I ignored it because I knew about the bug.
However, I just found out that my LV actually filled up due to another
bug that was actually my fault.

Now, I triggered some real bugs in btrfs, see:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
         Total devices 1 FS bytes used 445.73GiB
         devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu

Ok, I'm using 445GB, but losing 8.4TB, sigh.
   LV Path                /dev/vgds2/ubuntu
   LV Name                ubuntu
   LV Pool name           thinpool2
   LV Size                14.00 TiB
   Mapped size            60.25%  <= this is all the space free in my VG, so it's full now

We talked about fstrim, let's try that:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# fstrim -v .
.: 5.6 TiB (6116423237632 bytes) trimmed

Oh, great. Except this freed up nothing in LVM.

gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .
Dumping filters: flags 0x6, state 0x0, force is off
   METADATA (flags 0x2): balancing, usage=0
   SYSTEM (flags 0x2): balancing, usage=0
ERROR: error during balancing '.': Read-only file system

Ok, right, need to unmount/remount to clear read-only;
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .
Dumping filters: flags 0x6, state 0x0, force is off
   METADATA (flags 0x2): balancing, usage=0
   SYSTEM (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 8624 chunks
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -dusage=0 -v .
Dumping filters: flags 0x1, state 0x0, force is off
   DATA (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 8624 chunks
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
         Total devices 1 FS bytes used 8.42TiB
         devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu


Well, carap, see how 'used' went from 445.73GiB to 8.42TiB after balance?

Wtf? Can you do btrfs filesystem usage on that fs? I'd like to see the breakdown. I'm super confused about what's happening there.


I ran du to make sure my data is indeed only using 445GB.

So now, I'm pretty much hosed, the fielsystem seems to have been damaged in interesting ways.

I'll wait until tomorrow in case someone wants something from it, and I'll delete the entire
LV and start over.

And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
[64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
[64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
[64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
[64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001


This will happen if the transaction aborts, does it still happen after you unmount and remount? Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux