Hello, recently I noticed memory pressure issues on one of our btrfs-based backup servers, leading to OOM and occasional server crashes. The main reason seems to be very large btrfs slabs. Filesystem contains images of virtual machine disks, daily incremental backups are based on btrfs snapshots and VMware Changed Block Tracking to update only changed sectors. Backup snapshots are deleted after 30 days. The server has 5 GB of RAM and slabtop looks as follows: 114778 114777 99% 12.00K 65751 2 2104032K btrfs_free_space_bitmap 4449760 4132420 92% 0.07K 79460 56 317840K btrfs_free_space 206360 38855 18% 0.57K 3685 56 117920K radix_tree_node 49777 47922 96% 0.59K 1912 53 61184K inode_cache 37361 32217 86% 1.14K 1337 28 42784K btrfs_inode 36032 35430 98% 1.00K 1126 32 36032K kmalloc-1k 99372 95092 95% 0.19K 2367 42 18936K dentry 49662 10794 21% 0.26K 801 62 12816K btrfs_extent_buffer 46736 45496 97% 0.25K 731 64 11696K filp 74872 34784 46% 0.14K 1339 56 10712K btrfs_extent_map When I run cleanup of old snapshots, two more large btrfs slabs temporarily appear: 6226248 5121619 82% 0.14K 111280 56 890240K btrfs_delayed_ref_head 6175584 5139772 83% 0.11K 171544 36 686176K btrfs_delayed_data_ref Is it an expected behavior that btrfs slabs (especially btrfs_free_space_bitmap) grow to the above sizes and aren't automatically freed when the system runs out of free memory? As far as I know, free space bitmaps are stored on disk as well, so IHMO they shouldn't be permanently pinned in memory. I tried "echo 3 > drop_caches" but it obviously has no effect on these slabs. I'm also surprised that common filesystem operations can cause OOM killer to terminate almost all running applications. I would expect more disk I/O and slower operations, but no application/server crashes. Kernel version: vanilla 4.19.87 + custom patch that places metadata chunks on SSD device Total number of snapshots: 3941 (approx. 175 snapshots created and deleted every day) Mount options: noatime,compress=zstd,enospc_debug,space_cache=v2,commit=5 # btrfs fi usage /backup Overall: Device size: 36.12TiB Device allocated: 28.50TiB Device unallocated: 7.62TiB Device missing: 0.00B Used: 24.93TiB Free (estimated): 11.14TiB (min: 11.14TiB) Data ratio: 1.00 Metadata ratio: 1.00 Global reserve: 512.00MiB (used: 0.00B) Data,single: Size:28.39TiB, Used:24.87TiB /dev/sdd 3.93TiB /dev/sde 3.89TiB /dev/sdf 2.84TiB /dev/sdg 2.01TiB /dev/sdh 3.93TiB /dev/sdi 3.93TiB /dev/sdj 3.93TiB /dev/sdk 3.93TiB Metadata,single: Size:105.00GiB, Used:62.61GiB /dev/sdc 105.00GiB System,single: Size:36.00MiB, Used:3.14MiB /dev/sdc 36.00MiB Unallocated: /dev/sdc 14.97GiB /dev/sdd 69.00GiB /dev/sde 109.00GiB /dev/sdf 1.16TiB /dev/sdg 5.99TiB /dev/sdh 75.00GiB /dev/sdi 69.00GiB /dev/sdj 71.00GiB /dev/sdk 70.00GiB Best regards. Martin
