Very large btrfs_free_space_bitmap slab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

recently I noticed memory pressure issues on one of our btrfs-based backup servers, leading to OOM
and occasional server crashes. The main reason seems to be very large btrfs slabs. Filesystem
contains images of virtual machine disks, daily incremental backups are based on btrfs snapshots and
VMware Changed Block Tracking to update only changed sectors. Backup snapshots are deleted after 30
days. The server has 5 GB of RAM and slabtop looks as follows:

114778 114777  99%   12.00K  65751        2   2104032K btrfs_free_space_bitmap
4449760 4132420  92%    0.07K  79460       56    317840K btrfs_free_space
206360  38855  18%    0.57K   3685       56    117920K radix_tree_node
49777  47922  96%    0.59K   1912       53     61184K inode_cache
37361  32217  86%    1.14K   1337       28     42784K btrfs_inode
36032  35430  98%    1.00K   1126       32     36032K kmalloc-1k
99372  95092  95%    0.19K   2367       42     18936K dentry
49662  10794  21%    0.26K    801       62     12816K btrfs_extent_buffer
46736  45496  97%    0.25K    731       64     11696K filp
74872  34784  46%    0.14K   1339       56     10712K btrfs_extent_map

When I run cleanup of old snapshots, two more large btrfs slabs temporarily appear:

6226248  5121619  82%    0.14K   111280       56     890240K btrfs_delayed_ref_head
6175584 5139772  83%    0.11K   171544       36     686176K btrfs_delayed_data_ref

Is it an expected behavior that btrfs slabs (especially btrfs_free_space_bitmap) grow to the above
sizes and aren't automatically freed when the system runs out of free memory? As far as I know, free
space bitmaps are stored on disk as well, so IHMO they shouldn't be permanently pinned in memory. I
tried "echo 3 > drop_caches" but it obviously has no effect on these slabs. I'm also surprised that
common filesystem operations can cause OOM killer to terminate almost all running applications. I
would expect more disk I/O and slower operations, but no application/server crashes.

Kernel version: vanilla 4.19.87 + custom patch that places metadata chunks on SSD device
Total number of snapshots: 3941 (approx. 175 snapshots created and deleted every day)
Mount options: noatime,compress=zstd,enospc_debug,space_cache=v2,commit=5

# btrfs fi usage /backup
Overall:
    Device size:                  36.12TiB
    Device allocated:             28.50TiB
    Device unallocated:            7.62TiB
    Device missing:                  0.00B
    Used:                         24.93TiB
    Free (estimated):             11.14TiB      (min: 11.14TiB)
    Data ratio:                       1.00
    Metadata ratio:                   1.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:28.39TiB, Used:24.87TiB
   /dev/sdd        3.93TiB
   /dev/sde        3.89TiB
   /dev/sdf        2.84TiB
   /dev/sdg        2.01TiB
   /dev/sdh        3.93TiB
   /dev/sdi        3.93TiB
   /dev/sdj        3.93TiB
   /dev/sdk        3.93TiB

Metadata,single: Size:105.00GiB, Used:62.61GiB
   /dev/sdc      105.00GiB

System,single: Size:36.00MiB, Used:3.14MiB
   /dev/sdc       36.00MiB

Unallocated:
   /dev/sdc       14.97GiB
   /dev/sdd       69.00GiB
   /dev/sde      109.00GiB
   /dev/sdf        1.16TiB
   /dev/sdg        5.99TiB
   /dev/sdh       75.00GiB
   /dev/sdi       69.00GiB
   /dev/sdj       71.00GiB
   /dev/sdk       70.00GiB

Best regards.

Martin






[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux