FS Remounted RO due to false-positive for OOS?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I encountered the following issue and wasn't sure if it was known or not yet. I'll be glad to hear it matches a fingerprint of a known or fixed bug as I'm admittedly running an older kernel, but my searching skills have failed me.

I have an mdraid array formatted with BTRFS. 6x12TB drives in raid0. Only about 240GB of 72TB consumed at the time of OOS.

/etc/fstab mount options:

/dev/md0        /pandata/0      btrfs   defaults,space_cache=v2,noauto  0 0

uname:

Linux 4d00fa3d419078 4.12.14-lp150.11-default #1 SMP Fri May 11 08:28:30 UTC 2018 (a9fee09) x86_64 x86_64 x86_64 GNU/Linux

dmesg output:

[17939.536301] BTRFS: Transaction aborted (error -28)
[17939.536331] ------------[ cut here ]------------
[17939.542058] WARNING: CPU: 7 PID: 3372 at ../fs/btrfs/extent-tree.c:6988 __btrfs_free_extent.isra.64+0xb9d/0xd40 [btrfs] [17939.553779] Modules linked in: binfmt_misc af_packet bonding iscsi_ibft iscsi_boot_sysfs msr nls_iso8859_1 nls_cp437 vfat intel_rapl fat skx_edac x86_pkg_temp_thermal btrfs intel_powerclamp coretemp xor ipmi_ssif kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul crc32c_intel raid0 iTCO_wdt iTCO_vendor_support ghash_clmulni_intel pcbc dax_pmem ixgbe device_dax md_mod ptp nd_pmem pps_core mdio nd_btt aesni_intel aes_x86_64 raid6_pq crypto_simd glue_helper cryptd i2c_i801 lpc_ich ioatdma ipmi_si pcspkr mei_me mei nfit ipmi_devintf shpchp dca wmi ipmi_msghandler libnvdimm acpi_pad button joydev hid_generic usbhid ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops xhci_pci ttm xhci_hcd nvme drm ahci drm_panel_orientation_quirks nvme_core usbcore libahci sg dm_multipath dm_mod
[17939.631713]  scsi_dh_rdac scsi_dh_emc scsi_dh_alua efivarfs
[17939.638341] CPU: 7 PID: 3372 Comm: btrfs-transacti Not tainted 4.12.14-lp150.11-default #1 openSUSE Leap 15.0 (unreleased) [17939.650466] Hardware name: Supermicro SYS-F629P3-RTB/X11DPFR-S, BIOS 3.0c_PI021_2e 11/26/2019
[17939.660095] task: ffff88083b975680 task.stack: ffffc9000a238000
[17939.667128] RIP: 0010:__btrfs_free_extent.isra.64+0xb9d/0xd40 [btrfs]
[17939.674653] RSP: 0018:ffffc9000a23bc78 EFLAGS: 00010296
[17939.680953] RAX: 0000000000000026 RBX: 0000000000000000 RCX: 0000000000000000 [17939.689172] RDX: ffff88085c1dfd40 RSI: ffff88085c1d7a68 RDI: ffff88085c1d7a68 [17939.697386] RBP: 00000012b9a5c000 R08: 0000000000000511 R09: 0000000000000007 [17939.705602] R10: 0000000000000001 R11: 0000000000000001 R12: ffff8808530ae000 [17939.713803] R13: 00000000ffffffe4 R14: ffff8802edf64870 R15: ffff8801368c0230 [17939.722017] FS: 0000000000000000(0000) GS:ffff88085c1c0000(0000) knlGS:0000000000000000
[17939.731203] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[17939.738051] CR2: 00007f12998bea08 CR3: 000000000200a003 CR4: 00000000007606e0 [17939.746292] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [17939.754525] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[17939.762735] PKRU: 55555554
[17939.766521] Call Trace:
[17939.770075]  __btrfs_run_delayed_refs+0x5b9/0x1300 [btrfs]
[17939.776682]  btrfs_run_delayed_refs+0x68/0x250 [btrfs]
[17939.782948]  btrfs_commit_transaction+0x2df/0x900 [btrfs]
[17939.789462]  ? wait_woken+0x80/0x80
[17939.794087]  transaction_kthread+0x186/0x1a0 [btrfs]
[17939.800201]  ? btrfs_cleanup_transaction+0x4e0/0x4e0 [btrfs]
[17939.806983]  kthread+0x11a/0x130
[17939.811308]  ? kthread_create_on_node+0x40/0x40
[17939.816939]  ret_from_fork+0x1f/0x40
[17939.821591] Code: 00 00 48 c7 c6 c0 07 8e a0 4c 89 f7 41 bd ea ff ff ff e8 4d d0 09 00 e9 a0 f5 ff ff 44 89 ee 48 c7 c7 18 71 8e a0 e8 d9 95 96 e0 <0f> 0b e9 73 f5 ff ff 49 8b 46 60 f0 0f ba a8 30 17 00 00 02 72
[17939.842686] ---[ end trace 179787a3004a4525 ]---
[17939.848482] BTRFS: error (device md0) in __btrfs_free_extent:6988: errno=-28 No space left
[17939.857923] BTRFS info (device md0): forced readonly
[17939.864081] BTRFS: error (device md0) in btrfs_run_delayed_refs:3016: errno=-28 No space left [17939.873811] BTRFS warning (device md0): Skipping commit of aborted transaction. [17939.882319] BTRFS: error (device md0) in cleanup_transaction:1876: errno=-28 No space left
[17940.192941] BTRFS error (device md0): pending csums is 334954496

fsyncs for a running application immediately began to return "fileio: no more space" following the above. The mount went RO.

btrfs check output:

4d00fa3d419078:~ # btrfs check -p /dev/md0
Checking filesystem on /dev/md0
UUID: 2a71b152-ade6-4be6-9b2f-8db1e736455a
checking extents [O]
checking free space cache [o]
checking fs roots [.]
checking csums
checking root refs
found 242851065856 bytes used, no error found
total csum bytes: 234919228
total tree bytes: 2293776384
total fs tree bytes: 910114816
total extent tree bytes: 998359040
btree space waste bytes: 440673068
file data blocks allocated: 450663858176
 referenced 236223201280

A remount following btrfs check worked just fine.

btrfs usage fi reports:

# btrfs fi usage /pandata/0/
Overall:
    Device size:                  65.48TiB
    Device allocated:            276.02GiB
    Device unallocated:           65.21TiB
    Device missing:                  0.00B
    Used:                        227.67GiB
    Free (estimated):             65.26TiB      (min: 32.65TiB)
    Data ratio:                       1.00
    Metadata ratio:                   2.00
    Global reserve:              512.00MiB      (used: 0.00B)

Data,single: Size:268.00GiB, Used:223.57GiB
   /dev/md0      268.00GiB

Metadata,DUP: Size:4.00GiB, Used:2.05GiB
   /dev/md0        8.00GiB

System,DUP: Size:8.00MiB, Used:48.00KiB
   /dev/md0       16.00MiB

Unallocated:
   /dev/md0       65.21TiB

I suspect this is a free space cache issue, and a bug that false reports up the chain that there is no more space and then locks the FS out in RO mode. But why it doesn't hit on check or remount is unclear to me.

Any and all thoughts are greatly appreciated,

ellis



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux