btrfs filled up dm-thin and df%: shows 8.4TB of data used when I'm only using 10% of that.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Ok, first I'll update the subject line

On Fri, Feb 21, 2020 at 10:45:45AM +0500, Roman Mamedov wrote:
> On Thu, 20 Feb 2020 21:38:04 -0800
> Marc MERLIN <marc@xxxxxxxxxxx> wrote:
> 
> > I had a closer look, and even with 5.4.20, my whole lv is full now:
> >   LV Name                thinpool2
> >   Allocated pool data    99.99%
> >   Allocated metadata     59.88%
> 
> Oversubscribing thin storage should be done carefully and only with a very
> good reason, and when you run out of something you didn't have in the first
> place, seems hard to blame Btrfs or anyone else for it.

let's rewind.
It's a backup server, I used to have everything in a single 14TB
filesystem, I had too many snapshots, and was told to break it up in
smaller filesystems to work around btrfs' inability to scale properly
past a hundred snapshots or so (and that many snapshots blowing up both
kinds of btrfs check --repair, one of them forced me to buy 16GB of RAM
to max out my server until it still ran out of RAM and now I can't add
any).

I'm obviously not going to the olden days of making actual partitions
and guessing wrong every time how big each partition should be, so my
only solution left was to use dm-thin and subscribe the entire space to
all LVs.
I then have a cronjob that warns me if I start running low on in the
global VG pool.

Now, where it got confusing is that around the time I put the 5.4 with
the df problem, is the same time df filled up to 100% and started
mailing me. I ignored it because I knew about the bug.
However, I just found out that my LV actually filled up due to another
bug that was actually my fault.

Now, I triggered some real bugs in btrfs, see:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
        Total devices 1 FS bytes used 445.73GiB
        devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu

Ok, I'm using 445GB, but losing 8.4TB, sigh.
  LV Path                /dev/vgds2/ubuntu
  LV Name                ubuntu
  LV Pool name           thinpool2
  LV Size                14.00 TiB
  Mapped size            60.25%  <= this is all the space free in my VG, so it's full now

We talked about fstrim, let's try that:
gargamel:/mnt/btrfs_pool2/backup/ubuntu# fstrim -v .
.: 5.6 TiB (6116423237632 bytes) trimmed

Oh, great. Except this freed up nothing in LVM.

gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .  
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
ERROR: error during balancing '.': Read-only file system

Ok, right, need to unmount/remount to clear read-only;
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -musage=0 -v .  
Dumping filters: flags 0x6, state 0x0, force is off
  METADATA (flags 0x2): balancing, usage=0
  SYSTEM (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 8624 chunks
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs balance start -dusage=0 -v .  
Dumping filters: flags 0x1, state 0x0, force is off
  DATA (flags 0x2): balancing, usage=0
Done, had to relocate 0 out of 8624 chunks
gargamel:/mnt/btrfs_pool2/backup/ubuntu# btrfs fi show .
Label: 'ubuntu'  uuid: 905c90db-8081-4071-9c79-57328b8ac0d5
        Total devices 1 FS bytes used 8.42TiB
        devid    1 size 14.00TiB used 8.44TiB path /dev/mapper/vgds2-ubuntu


Well, carap, see how 'used' went from 445.73GiB to 8.42TiB after balance?

I ran du to make sure my data is indeed only using 445GB.

So now, I'm pretty much hosed, the fielsystem seems to have been damaged in interesting ways.

I'll wait until tomorrow in case someone wants something from it, and I'll delete the entire
LV and start over.

And now for extra points, this also damaged a 2nd of my filesystems on the same VG :(
[64723.601630] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
[64723.628708] BTRFS error (device dm-17): bad tree block start, want 5782272294912 have 0
[64897.028176] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001
[64897.080355] BTRFS error (device dm-13): parent transid verify failed on 22724608 wanted 10005 found 10001

Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
 
Home page: http://marc.merlins.org/                       | PGP 7F55D5F27AAF9D08



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux