On Fri, Sep 20, 2019 at 8:31 AM Pete <pete@xxxxxxxxxxxxxxx> wrote: > > I have a btrfs that is on top of an lvm logical volume on top of > dm-crypt on a single nvme drive (Samsung 870 Pro 512GB). > > I added a second logical volume to give more space to get rid of ENOSPC > errors during balance, but to no avail. This was after I started > getting enospc during balance. Without this additional logical device, > before balance, I had run out of space owning to some unfortunate > scripting interacting with lxc snapshots (non btrfs backed in the > config, so a copy) and some copying. I was performing a balance, > following some deletions, when trying to get things back to a better state. > > root@phoenix:/var/lib/lxc# btrfs balance start /var/lib/lxc > WARNING: > > Full balance without filters requested. This operation is very > intense and takes potentially very long. It is recommended to > use the balance filters to narrow down the scope of balance. > Use 'btrfs balance start --full-balance' option to skip this > warning. The operation will start in 10 seconds. > Use Ctrl-C to stop it. > 10 9 8 7 6 5 4 3 2 1 > Starting balance without any filters. > ERROR: error during balancing '/var/lib/lxc': No space left on device > There may be more info in syslog - try dmesg | tail > root@phoenix:/var/lib/lxc# > > I can still write to the filesystem. > > > Kernel 5.1.21 (downgraded from 5.2.12) > > root@phoenix:/var/lib/lxc# btrfs --version > btrfs-progs v5.1 > > root@phoenix:/var/lib/lxc# btrfs fi show /var/lib/lxc > Label: 'LXC_BTRFS' uuid: 6b0245ec-bdd4-4076-b800-2243d466b174 > Total devices 2 FS bytes used 79.74GiB > devid 1 size 250.00GiB used 93.03GiB path > /dev/mapper/nvme0_vg-lxc > devid 2 size 80.00GiB used 0.00B path > /dev/mapper/nvme0_vg-tempdel > > root@phoenix:/var/lib/lxc# btrfs fi u /var/lib/lxc > Overall: > Device size: 330.00GiB > Device allocated: 93.03GiB > Device unallocated: 236.97GiB > Device missing: 0.00B > Used: 79.74GiB > Free (estimated): 237.70GiB (min: 237.70GiB) > Data ratio: 1.00 > Metadata ratio: 1.00 > Global reserve: 512.00MiB (used: 0.00B) > > Data,single: Size:71.00GiB, Used:70.26GiB > /dev/mapper/nvme0_vg-lxc 71.00GiB > > Metadata,single: Size:22.00GiB, Used:9.48GiB > /dev/mapper/nvme0_vg-lxc 22.00GiB > > System,single: Size:32.00MiB, Used:16.00KiB > /dev/mapper/nvme0_vg-lxc 32.00MiB > > Unallocated: > /dev/mapper/nvme0_vg-lxc 156.97GiB > /dev/mapper/nvme0_vg-tempdel 80.00GiB > > btrfs fi df /var/lib/lxc > Data, single: total=71.00GiB, used=70.26GiB > System, single: total=32.00MiB, used=16.00KiB > Metadata, single: total=22.00GiB, used=9.48GiB > GlobalReserve, single: total=512.00MiB, used=0.00B > root@phoenix:/var/lib/lxc# > > > > An unfiltered balance shows ENOSPC errors: > btrfs balance start /var/lib/lxc > > Last bit of: > dmesg | tail -n 100 > > > [ 920.915627] BTRFS info (device dm-4): found 67520 extents > [ 922.037071] BTRFS info (device dm-4): relocating block group > 1703106576384 flags data > [ 924.742432] BTRFS info (device dm-4): found 57082 extents > [ 927.245236] BTRFS info (device dm-4): found 57082 extents > [ 928.371624] BTRFS info (device dm-4): relocating block group > 1702032834560 flags data > [ 931.230841] BTRFS info (device dm-4): found 60454 extents > [ 933.373249] BTRFS info (device dm-4): found 60454 extents > [ 934.336628] BTRFS info (device dm-4): relocating block group > 1700959092736 flags data > [ 937.330097] BTRFS info (device dm-4): found 67151 extents > [ 940.296250] BTRFS info (device dm-4): found 67151 extents > [ 941.524664] BTRFS info (device dm-4): relocating block group > 1699885350912 flags data > [ 944.264618] BTRFS info (device dm-4): found 54931 extents > [ 945.910666] BTRFS info (device dm-4): found 54931 extents > [ 946.796308] BTRFS info (device dm-4): relocating block group > 1698811609088 flags data > [ 949.426823] BTRFS info (device dm-4): found 55428 extents > [ 950.880553] BTRFS info (device dm-4): found 55428 extents > [ 951.622569] BTRFS info (device dm-4): relocating block group > 1697737867264 flags data > [ 955.223382] BTRFS info (device dm-4): found 52897 extents > [ 956.544084] BTRFS info (device dm-4): found 52897 extents > [ 957.300021] BTRFS info (device dm-4): relocating block group > 1696664125440 flags data > [ 959.936585] BTRFS info (device dm-4): found 48407 extents > [ 961.421771] BTRFS info (device dm-4): found 48407 extents > [ 962.203680] BTRFS info (device dm-4): relocating block group > 1695590383616 flags data > [ 964.281128] BTRFS info (device dm-4): found 28238 extents > [ 965.325130] BTRFS info (device dm-4): found 28238 extents > [ 965.886794] BTRFS info (device dm-4): relocating block group > 1694516641792 flags data > [ 968.999507] BTRFS info (device dm-4): found 46060 extents > [ 970.447815] BTRFS info (device dm-4): found 46060 extents > [ 971.276287] BTRFS info (device dm-4): relocating block group > 1693442899968 flags data > [ 974.914746] BTRFS info (device dm-4): found 55159 extents > [ 976.914228] BTRFS info (device dm-4): found 55159 extents > [ 977.758643] BTRFS info (device dm-4): relocating block group > 1692369158144 flags data > [ 980.081069] BTRFS info (device dm-4): found 36859 extents > [ 981.630065] BTRFS info (device dm-4): found 36859 extents > [ 982.498586] BTRFS info (device dm-4): relocating block group > 1691295416320 flags data > [ 984.929101] BTRFS info (device dm-4): found 50062 extents > [ 986.440469] BTRFS info (device dm-4): found 50062 extents > [ 987.281364] BTRFS info (device dm-4): 11 enospc errors during balance > [ 987.281365] BTRFS info (device dm-4): balance: ended with status: -28 > > Unfortunately I don't seem to have any more info in dmesg of the enospc > errors: You need to mount with enospc_debug to get more information, it might be useful for a developer. This -28 error is one that has mostly gone away, I don't know if the cause was ever discovered, but my recollection is once you're hitting it, you're better off creating a new file system rather than chasing it. But you could use 5.2.15 or newer, mount with enospc_debug, and do filtered balance. You could start with 1% increments, e.g. -dusage=1, -dusage=2, up to 5. And then do it in 5% increments up to 70. The idea of that is just to try and avoid enospc while picking off the low hanging fruit first (the block groups with the most free space). At that point I would then start a full balance, no filter. Maybe that'll get it back on track. I haven't ever experienced this so this strategy is totally a spitball method of trying to fix it. There is some degree of metadata rewrites that happens as part of balance, and balance is pretty complicated, and not entirely deterministic - meaning it's plausible the filtered balance followed by a full balance could fix it. But I don't understand it well enough. Also I'd remove any snapshots you don't really need, it'll make the balance less complicated and faster. -- Chris Murphy
