Hi all,
I encountered the following issue and wasn't sure if it was known or not
yet. I'll be glad to hear it matches a fingerprint of a known or fixed
bug as I'm admittedly running an older kernel, but my searching skills
have failed me.
I have an mdraid array formatted with BTRFS. 6x12TB drives in raid0.
Only about 240GB of 72TB consumed at the time of OOS.
/etc/fstab mount options:
/dev/md0 /pandata/0 btrfs defaults,space_cache=v2,noauto 0 0
uname:
Linux 4d00fa3d419078 4.12.14-lp150.11-default #1 SMP Fri May 11 08:28:30
UTC 2018 (a9fee09) x86_64 x86_64 x86_64 GNU/Linux
dmesg output:
[17939.536301] BTRFS: Transaction aborted (error -28)
[17939.536331] ------------[ cut here ]------------
[17939.542058] WARNING: CPU: 7 PID: 3372 at
../fs/btrfs/extent-tree.c:6988 __btrfs_free_extent.isra.64+0xb9d/0xd40
[btrfs]
[17939.553779] Modules linked in: binfmt_misc af_packet bonding
iscsi_ibft iscsi_boot_sysfs msr nls_iso8859_1 nls_cp437 vfat intel_rapl
fat skx_edac x86_pkg_temp_thermal btrfs intel_powerclamp coretemp xor
ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul
crc32c_intel raid0 iTCO_wdt iTCO_vendor_support ghash_clmulni_intel pcbc
dax_pmem ixgbe device_dax md_mod ptp nd_pmem pps_core mdio nd_btt
aesni_intel aes_x86_64 raid6_pq crypto_simd glue_helper cryptd i2c_i801
lpc_ich ioatdma ipmi_si pcspkr mei_me mei nfit ipmi_devintf shpchp dca
wmi ipmi_msghandler libnvdimm acpi_pad button joydev hid_generic usbhid
ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt
fb_sys_fops xhci_pci ttm xhci_hcd nvme drm ahci
drm_panel_orientation_quirks nvme_core usbcore libahci sg dm_multipath
dm_mod
[17939.631713] scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[17939.638341] CPU: 7 PID: 3372 Comm: btrfs-transacti Not tainted
4.12.14-lp150.11-default #1 openSUSE Leap 15.0 (unreleased)
[17939.650466] Hardware name: Supermicro SYS-F629P3-RTB/X11DPFR-S, BIOS
3.0c_PI021_2e 11/26/2019
[17939.660095] task: ffff88083b975680 task.stack: ffffc9000a238000
[17939.667128] RIP: 0010:__btrfs_free_extent.isra.64+0xb9d/0xd40 [btrfs]
[17939.674653] RSP: 0018:ffffc9000a23bc78 EFLAGS: 00010296
[17939.680953] RAX: 0000000000000026 RBX: 0000000000000000 RCX:
0000000000000000
[17939.689172] RDX: ffff88085c1dfd40 RSI: ffff88085c1d7a68 RDI:
ffff88085c1d7a68
[17939.697386] RBP: 00000012b9a5c000 R08: 0000000000000511 R09:
0000000000000007
[17939.705602] R10: 0000000000000001 R11: 0000000000000001 R12:
ffff8808530ae000
[17939.713803] R13: 00000000ffffffe4 R14: ffff8802edf64870 R15:
ffff8801368c0230
[17939.722017] FS: 0000000000000000(0000) GS:ffff88085c1c0000(0000)
knlGS:0000000000000000
[17939.731203] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17939.738051] CR2: 00007f12998bea08 CR3: 000000000200a003 CR4:
00000000007606e0
[17939.746292] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[17939.754525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[17939.762735] PKRU: 55555554
[17939.766521] Call Trace:
[17939.770075] __btrfs_run_delayed_refs+0x5b9/0x1300 [btrfs]
[17939.776682] btrfs_run_delayed_refs+0x68/0x250 [btrfs]
[17939.782948] btrfs_commit_transaction+0x2df/0x900 [btrfs]
[17939.789462] ? wait_woken+0x80/0x80
[17939.794087] transaction_kthread+0x186/0x1a0 [btrfs]
[17939.800201] ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
[17939.806983] kthread+0x11a/0x130
[17939.811308] ? kthread_create_on_node+0x40/0x40
[17939.816939] ret_from_fork+0x1f/0x40
[17939.821591] Code: 00 00 48 c7 c6 c0 07 8e a0 4c 89 f7 41 bd ea ff ff
ff e8 4d d0 09 00 e9 a0 f5 ff ff 44 89 ee 48 c7 c7 18 71 8e a0 e8 d9 95
96 e0 <0f> 0b e9 73 f5 ff ff 49 8b 46 60 f0 0f ba a8 30 17 00 00 02 72
[17939.842686] ---[ end trace 179787a3004a4525 ]---
[17939.848482] BTRFS: error (device md0) in __btrfs_free_extent:6988:
errno=-28 No space left
[17939.857923] BTRFS info (device md0): forced readonly
[17939.864081] BTRFS: error (device md0) in btrfs_run_delayed_refs:3016:
errno=-28 No space left
[17939.873811] BTRFS warning (device md0): Skipping commit of aborted
transaction.
[17939.882319] BTRFS: error (device md0) in cleanup_transaction:1876:
errno=-28 No space left
[17940.192941] BTRFS error (device md0): pending csums is 334954496
fsyncs for a running application immediately began to return "fileio: no
more space" following the above. The mount went RO.
btrfs check output:
4d00fa3d419078:~ # btrfs check -p /dev/md0
Checking filesystem on /dev/md0
UUID: 2a71b152-ade6-4be6-9b2f-8db1e736455a
checking extents [O]
checking free space cache [o]
checking fs roots [.]
checking csums
checking root refs
found 242851065856 bytes used, no error found
total csum bytes: 234919228
total tree bytes: 2293776384
total fs tree bytes: 910114816
total extent tree bytes: 998359040
btree space waste bytes: 440673068
file data blocks allocated: 450663858176
referenced 236223201280
A remount following btrfs check worked just fine.
btrfs usage fi reports:
# btrfs fi usage /pandata/0/
Overall:
Device size: 65.48TiB
Device allocated: 276.02GiB
Device unallocated: 65.21TiB
Device missing: 0.00B
Used: 227.67GiB
Free (estimated): 65.26TiB (min: 32.65TiB)
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,single: Size:268.00GiB, Used:223.57GiB
/dev/md0 268.00GiB
Metadata,DUP: Size:4.00GiB, Used:2.05GiB
/dev/md0 8.00GiB
System,DUP: Size:8.00MiB, Used:48.00KiB
/dev/md0 16.00MiB
Unallocated:
/dev/md0 65.21TiB
I suspect this is a free space cache issue, and a bug that false reports
up the chain that there is no more space and then locks the FS out in RO
mode. But why it doesn't hit on check or remount is unclear to me.
Any and all thoughts are greatly appreciated,
ellis