Hello btrfs-aholics,
I've been experiencing repetitive "kernel BUG" occurences in the past
few days trying to balance a raid5 filesystem after adding a new drive.
It occurs on both 4.2.0 and 4.1.7, using 4.2 userspace tools.
The raid5 setup was 2x4T drives (created 3 days ago to upgrade smoothly
from mdadm/ext4 to btrfs), then I added a 3rd drive and tried to
balance.
metadata is in raid1.
root@nas:~# uname -a
Linux nas 4.1.7-040107-generic #201509131330 SMP Sun Sep 13 17:32:28 UTC
2015 x86_64 x86_64 x86_64 GNU/Linux
(and also
Linux version 4.2.0-7-generic (buildd@lgw01-60) (gcc version 5.2.1
20150825 (Ubuntu 5.2.1-15ubuntu5) ) #7-Ubuntu SMP Tue Sep 1 16:43:10 UTC
2015 (Ubuntu 4.2.0-7.7-generic 4.2.0)
)
root@nas:~# btrfs --version
btrfs-progs v4.2
root@nas:~# btrfs fi show
Label: 'tank' uuid: 6bec1608-d9c0-453e-87eb-8b8663c9010d
Total devices 3 FS bytes used 2.66TiB
devid 1 size 2.73TiB used 2.50TiB path
/dev/mapper/luks-WDC_WD30EFRX-68EUZN0_WD-WCC4N2STUCVR
devid 2 size 2.73TiB used 2.50TiB path
/dev/mapper/luks-WDC_WD30EFRX-68EUZN0_WD-WCC4N2DVRDXF
devid 4 size 2.73TiB used 190.03GiB path
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
btrfs-progs v4.2
root@nas:~# btrfs fi df /tank/
Data, RAID5: total=2.67TiB, used=2.65TiB
System, RAID1: total=32.00MiB, used=384.00KiB
Metadata, RAID1: total=6.00GiB, used=4.38GiB
GlobalReserve, single: total=512.00MiB, used=0.00B
root@nas:~# btrfs fi usage /tank/
WARNING: RAID56 detected, not implemented
Overall:
Device size: 8.19TiB
Device allocated: 12.06GiB
Device unallocated: 8.17TiB
Device missing: 0.00B
Used: 8.76GiB
Free (estimated): 0.00B (min: 8.00EiB)
Data ratio: 0.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Data,RAID5: Size:2.67TiB, Used:2.65TiB
/dev/dm-1 2.49TiB
/dev/dm-2 2.49TiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
184.00GiB
Metadata,RAID1: Size:6.00GiB, Used:4.38GiB
/dev/dm-1 3.00GiB
/dev/dm-2 3.00GiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
6.00GiB
System,RAID1: Size:32.00MiB, Used:384.00KiB
/dev/dm-2 32.00MiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
32.00MiB
Unallocated:
/dev/dm-1 239.52GiB
/dev/dm-2 239.49GiB
/dev/mapper/luks-WDC_WD30EZRX-00MMMB0_WD-WCAWZ3013164
2.54TiB
Each drive had LUKS configured on them (directly on /dev/sdX, no
partition), then the resulting virtual drive is directly used as a btrfs
device.
root@nas:~# time btrfs balance start /tank
Segmentation fault
real 750m55.550s
with the following kernel BUG in the log :
nas kernel: [17863.907793] ------------[ cut here ]------------
nas kernel: [17863.907833] kernel BUG at
/build/linux-4dBub_/linux-4.2.0/fs/btrfs/extent-tree.c:1833!
nas kernel: [17863.907857] invalid opcode: 0000 [#1] SMP
nas kernel: [17863.907877] Modules linked in: xts gf128mul drbg
ansi_cprng xt_multiport xt_comment xt_conntrack xt_nat xt_tcpudp
nfnetlink_queue nfnetlink_log nfne
nas kernel: [17863.908264] CPU: 1 PID: 17379 Comm: btrfs Not tainted
4.2.0-7-generic #7-Ubuntu
nas kernel: [17863.908281] Hardware name: ASUS All Series/H87I-PLUS,
BIOS 1005 01/06/2014
nas kernel: [17863.908297] task: ffff880036184c80 ti: ffff8800507f4000
task.ti: ffff8800507f4000
nas kernel: [17863.908314] RIP: 0010:[<ffffffffc0311ab6>]
[<ffffffffc0311ab6>] insert_inline_extent_backref+0xc6/0xd0 [btrfs]
nas kernel: [17863.908349] RSP: 0018:ffff8800507f7698 EFLAGS: 00010293
nas kernel: [17863.908362] RAX: 0000000000000000 RBX: 0000000000000001
RCX: 0000000000000001
nas kernel: [17863.908378] RDX: ffff880000000000 RSI: 0000000000000001
RDI: 0000000000000000
nas kernel: [17863.908394] RBP: ffff8800507f7718 R08: 0000000000004000
R09: ffff8800507f7598
nas kernel: [17863.908410] R10: 0000000000000000 R11: 0000000000000003
R12: ffff8800c5c65000
nas kernel: [17863.908427] R13: 00000307b70ac000 R14: 0000000000000000
R15: ffff880108d5c630
nas kernel: [17863.908443] FS: 00007f9300a7d900(0000)
GS:ffff88011fb00000(0000) knlGS:0000000000000000
nas kernel: [17863.908461] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
nas kernel: [17863.908475] CR2: 00007f0a351c6000 CR3: 0000000118c0d000
CR4: 00000000000406e0
nas kernel: [17863.908491] Stack:
nas kernel: [17863.908496] 00000307b70ac000 0000000000000d0b
0000000000000001 0000000000000000
nas kernel: [17863.908516] 0000030600000001 ffffffff811cf4ca
0000000000000000 ffffffffc030550a
nas kernel: [17863.908535] 0000000000270026 00000000000035d7
ffff88001fdd95c0 ffff8800927ae000
nas kernel: [17863.908555] Call Trace:
nas kernel: [17863.908564] [<ffffffff811cf4ca>] ?
kmem_cache_alloc+0x1ca/0x200
nas kernel: [17863.908582] [<ffffffffc030550a>] ?
btrfs_alloc_path+0x1a/0x20 [btrfs]
nas kernel: [17863.908601] [<ffffffffc0311f98>]
__btrfs_inc_extent_ref.isra.52+0x98/0x250 [btrfs]
nas kernel: [17863.908623] [<ffffffffc031757a>]
__btrfs_run_delayed_refs+0xc4a/0x1050 [btrfs]
nas kernel: [17863.908643] [<ffffffffc030f980>] ?
add_pinned_bytes+0x70/0x80 [btrfs]
nas kernel: [17863.908662] [<ffffffffc0318087>] ?
walk_up_proc+0xd7/0x4a0 [btrfs]
nas kernel: [17863.908681] [<ffffffffc031a5be>]
btrfs_run_delayed_refs.part.73+0x6e/0x270 [btrfs]
nas kernel: [17863.908702] [<ffffffffc031a7d5>]
btrfs_run_delayed_refs+0x15/0x20 [btrfs]
nas kernel: [17863.908723] [<ffffffffc032e38a>]
btrfs_should_end_transaction+0x5a/0x60 [btrfs]
nas kernel: [17863.908744] [<ffffffffc0318dad>]
btrfs_drop_snapshot+0x43d/0x820 [btrfs]
nas kernel: [17863.908765] [<ffffffffc0328c00>] ?
btrfs_get_fs_root+0x30/0x80 [btrfs]
nas kernel: [17863.908787] [<ffffffffc03813c2>]
merge_reloc_roots+0xd2/0x240 [btrfs]
nas kernel: [17863.908808] [<ffffffffc038178a>]
relocate_block_group+0x25a/0x690 [btrfs]
nas kernel: [17863.908829] [<ffffffffc0381d8a>]
btrfs_relocate_block_group+0x1ca/0x2c0 [btrfs]
nas kernel: [17863.909470] [<ffffffffc03564de>]
btrfs_relocate_chunk.isra.39+0x3e/0xb0 [btrfs]
nas kernel: [17863.910108] [<ffffffffc0357847>]
__btrfs_balance+0x4c7/0x8b0 [btrfs]
nas kernel: [17863.910748] [<ffffffffc0357ec0>]
btrfs_balance+0x290/0x610 [btrfs]
nas kernel: [17863.911406] [<ffffffffc0364014>] ?
btrfs_ioctl_balance+0x274/0x3c0 [btrfs]
nas kernel: [17863.912065] [<ffffffffc0363f09>]
btrfs_ioctl_balance+0x169/0x3c0 [btrfs]
nas kernel: [17863.912734] [<ffffffffc03658d8>]
btrfs_ioctl+0x548/0x26d0 [btrfs]
nas kernel: [17863.913398] [<ffffffff811c5f12>] ?
alloc_pages_vma+0xc2/0x230
nas kernel: [17863.914014] [<ffffffff81185d6b>] ?
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [17863.914651] [<ffffffff811a6d25>] ?
handle_mm_fault+0xbc5/0x16a0
nas kernel: [17863.915260] [<ffffffff811aa4dd>] ?
__vma_link_rb+0xfd/0x110
nas kernel: [17863.915841] [<ffffffff811aa5a9>] ? vma_link+0xb9/0xc0
nas kernel: [17863.916427] [<ffffffff811fffd5>]
do_vfs_ioctl+0x285/0x470
nas kernel: [17863.916970] [<ffffffff810630a4>] ?
__do_page_fault+0x1b4/0x400
nas kernel: [17863.917528] [<ffffffff81200239>] SyS_ioctl+0x79/0x90
nas kernel: [17863.918037] [<ffffffff817b6cf2>]
entry_SYSCALL_64_fastpath+0x16/0x75
nas kernel: [17863.918564] Code: 45 10 49 89 d9 48 8b 55 c8 4c 89 34 24
4c 89 e9 4c 89 fe 4c 89 e7 48 89 44 24 10 8b 45 28 89 44 24 08 e8 fe d6
ff ff 31 c0 eb bb <
nas kernel: [17863.919683] RIP [<ffffffffc0311ab6>]
insert_inline_extent_backref+0xc6/0xd0 [btrfs]
nas kernel: [17863.920202] RSP <ffff8800507f7698>
nas kernel: [17863.922890] ---[ end trace f9b514d72fc0a628 ]---
I downgraded to 4.1.7 just in case, and got the same thing after a
couple hours :
nas kernel: [47155.229661] ------------[ cut here ]------------
nas kernel: [47155.229670] WARNING: CPU: 1 PID: 9145 at
/home/kernel/COD/linux/fs/btrfs/delayed-ref.c:475
update_existing_ref+0x18b/0x1e0 [btrfs]()
nas kernel: [47155.229671] Modules linked in: ufs qnx4 hfsplus hfs minix
ntfs msdos jfs xfs libcrc32c xts gf128mul xt_multiport xt_comment
xt_conntrack xt_nat xt_t
nas kernel: [47155.229704] CPU: 1 PID: 9145 Comm: btrfs Tainted: P
W OE 4.1.7-040107-generic #201509131330
nas kernel: [47155.229705] Hardware name: ASUS All Series/H87I-PLUS,
BIOS 1005 01/06/2014
nas kernel: [47155.229706] ffffffffc0381b30 ffff880103eff658
ffffffff817d0ee3 0000000000000000
nas kernel: [47155.229707] 0000000000000000 ffff880103eff698
ffffffff81079c3a 0000000000001000
nas kernel: [47155.229708] ffff88009c3806e0 ffff88009a96a428
ffff88009a96a3c0 ffff8800a3064420
nas kernel: [47155.229710] Call Trace:
nas kernel: [47155.229713] [<ffffffff817d0ee3>] dump_stack+0x45/0x57
nas kernel: [47155.229714] [<ffffffff81079c3a>]
warn_slowpath_common+0x8a/0xc0
nas kernel: [47155.229715] [<ffffffff81079d2a>]
warn_slowpath_null+0x1a/0x20
nas kernel: [47155.229723] [<ffffffffc0349cdb>]
update_existing_ref+0x18b/0x1e0 [btrfs]
nas kernel: [47155.229730] [<ffffffffc034a0cb>]
add_delayed_tree_ref+0xeb/0x1a0 [btrfs]
nas kernel: [47155.229737] [<ffffffffc034accc>]
btrfs_add_delayed_tree_ref+0x10c/0x180 [btrfs]
nas kernel: [47155.229744] [<ffffffffc02e6610>]
btrfs_free_extent+0xe0/0x140 [btrfs]
nas kernel: [47155.229750] [<ffffffffc02d3735>] ?
btrfs_release_path+0x25/0xb0 [btrfs]
nas kernel: [47155.229757] [<ffffffffc02e6958>]
do_walk_down+0x2e8/0x940 [btrfs]
nas kernel: [47155.229763] [<ffffffffc02e3b82>] ?
walk_down_proc+0x2e2/0x310 [btrfs]
nas kernel: [47155.229771] [<ffffffffc02fc68d>] ?
join_transaction.isra.14+0xfd/0x410 [btrfs]
nas kernel: [47155.229777] [<ffffffffc02e7076>]
walk_down_tree+0xc6/0x100 [btrfs]
nas kernel: [47155.229784] [<ffffffffc02eaa4a>]
btrfs_drop_snapshot+0x41a/0x880 [btrfs]
nas kernel: [47155.229792] [<ffffffffc034cb00>] ?
should_ignore_root.part.15+0x50/0x50 [btrfs]
nas kernel: [47155.229800] [<ffffffffc0351d49>]
merge_reloc_roots+0xd9/0x240 [btrfs]
nas kernel: [47155.229807] [<ffffffffc0352119>]
relocate_block_group+0x269/0x670 [btrfs]
nas kernel: [47155.229814] [<ffffffffc03526f6>]
btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
nas kernel: [47155.229822] [<ffffffffc0325cbe>]
btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
nas kernel: [47155.229830] [<ffffffffc03270a4>]
__btrfs_balance+0x4e4/0x8b0 [btrfs]
nas kernel: [47155.229838] [<ffffffffc032781a>]
btrfs_balance+0x3aa/0x680 [btrfs]
nas kernel: [47155.229846] [<ffffffffc033086b>] ?
btrfs_ioctl_balance+0x29b/0x520 [btrfs]
nas kernel: [47155.229853] [<ffffffffc0330734>]
btrfs_ioctl_balance+0x164/0x520 [btrfs]
nas kernel: [47155.229860] [<ffffffffc03355f7>]
btrfs_ioctl+0x597/0x2b30 [btrfs]
nas kernel: [47155.229862] [<ffffffff811d2ad5>] ?
alloc_pages_vma+0xb5/0x200
nas kernel: [47155.229864] [<ffffffff81191a3b>] ?
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [47155.229865] [<ffffffff811b280c>] ?
handle_mm_fault+0xbac/0x17e0
nas kernel: [47155.229866] [<ffffffff811b6a08>] ?
__vma_link_rb+0xc8/0xf0
nas kernel: [47155.229867] [<ffffffff8120ce68>]
do_vfs_ioctl+0x2f8/0x510
nas kernel: [47155.229869] [<ffffffff81066f76>] ?
__do_page_fault+0x1b6/0x450
nas kernel: [47155.229870] [<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
nas kernel: [47155.229871] [<ffffffff81067240>] ?
do_page_fault+0x30/0x80
nas kernel: [47155.229873] [<ffffffff817d8ab2>]
system_call_fastpath+0x16/0x75
nas kernel: [47155.229874] ---[ end trace e4064ae1c7878a22 ]---
and 2 seconds later :
nas kernel: [47157.228137] ------------[ cut here ]------------
nas kernel: [47157.228190] kernel BUG at
/home/kernel/COD/linux/fs/btrfs/extent-tree.c:2248!
nas kernel: [47157.228259] invalid opcode: 0000 [#1] SMP
nas kernel: [47157.228301] Modules linked in: ufs qnx4 hfsplus hfs minix
ntfs msdos jfs xfs libcrc32c xts gf128mul xt_multiport xt_comment
xt_conntrack xt_nat xt_t
nas kernel: [47157.229656] CPU: 0 PID: 9145 Comm: btrfs Tainted: P
W OE 4.1.7-040107-generic #201509131330
nas kernel: [47157.229741] Hardware name: ASUS All Series/H87I-PLUS,
BIOS 1005 01/06/2014
nas kernel: [47157.229807] task: ffff88011a8cd080 ti: ffff880103efc000
task.ti: ffff880103efc000
nas kernel: [47157.229875] RIP: 0010:[<ffffffffc02e8251>]
[<ffffffffc02e8251>] __btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
nas kernel: [47157.229998] RSP: 0018:ffff880103eff7c8 EFLAGS: 00010202
nas kernel: [47157.230048] RAX: 0000000000000001 RBX: 0000000000000000
RCX: 00000000000001e1
nas kernel: [47157.230113] RDX: ffff8800c61ad000 RSI: ffff8800c6adaed0
RDI: ffff8800c6adaec8
nas kernel: [47157.230179] RBP: ffff880103eff8f8 R08: 0000000000000000
R09: 00000001802e002c
nas kernel: [47157.230244] R10: ffffffffc02e75d3 R11: 0000000000000d0a
R12: ffff880056f0c9f8
nas kernel: [47157.230310] R13: 000003cdf0f80000 R14: ffff8800c6adae60
R15: 0000000000000000
nas kernel: [47157.230377] FS: 00007f5f63146900(0000)
GS:ffff88011fa00000(0000) knlGS:0000000000000000
nas kernel: [47157.230451] CS: 0010 DS: 0000 ES: 0000 CR0:
0000000080050033
nas kernel: [47157.230504] CR2: 00007f6126ad5000 CR3: 00000000041be000
CR4: 00000000000406f0
nas kernel: [47157.230569] Stack:
nas kernel: [47157.230590] 0000000000000001 0000000000000000
0000042000000001 0000000000000001
nas kernel: [47157.230669] 0000000000000000 0000000000000cf6
ffff88009a930480 00000000000020ae
nas kernel: [47157.230748] 0000000203eff838 0000000000004000
ffff88009a930480 ffff88009a930480
nas kernel: [47157.230827] Call Trace:
nas kernel: [47157.230882] [<ffffffffc02ec483>]
btrfs_run_delayed_refs.part.66+0x73/0x270 [btrfs]
nas kernel: [47157.230975] [<ffffffffc02ec697>]
btrfs_run_delayed_refs+0x17/0x20 [btrfs]
nas kernel: [47157.231065] [<ffffffffc02fd169>]
btrfs_should_end_transaction+0x49/0x60 [btrfs]
nas kernel: [47157.231155] [<ffffffffc02eaaa2>]
btrfs_drop_snapshot+0x472/0x880 [btrfs]
nas kernel: [47157.231251] [<ffffffffc034cb00>] ?
should_ignore_root.part.15+0x50/0x50 [btrfs]
nas kernel: [47157.231347] [<ffffffffc0351d49>]
merge_reloc_roots+0xd9/0x240 [btrfs]
nas kernel: [47157.231433] [<ffffffffc0352119>]
relocate_block_group+0x269/0x670 [btrfs]
nas kernel: [47157.231521] [<ffffffffc03526f6>]
btrfs_relocate_block_group+0x1d6/0x2e0 [btrfs]
nas kernel: [47157.231618] [<ffffffffc0325cbe>]
btrfs_relocate_chunk.isra.38+0x3e/0xc0 [btrfs]
nas kernel: [47157.231714] [<ffffffffc03270a4>]
__btrfs_balance+0x4e4/0x8b0 [btrfs]
nas kernel: [47157.231799] [<ffffffffc032781a>]
btrfs_balance+0x3aa/0x680 [btrfs]
nas kernel: [47157.231885] [<ffffffffc033086b>] ?
btrfs_ioctl_balance+0x29b/0x520 [btrfs]
nas kernel: [47157.231974] [<ffffffffc0330734>]
btrfs_ioctl_balance+0x164/0x520 [btrfs]
nas kernel: [47157.232062] [<ffffffffc03355f7>]
btrfs_ioctl+0x597/0x2b30 [btrfs]
nas kernel: [47157.232125] [<ffffffff811d2ad5>] ?
alloc_pages_vma+0xb5/0x200
nas kernel: [47157.232183] [<ffffffff81191a3b>] ?
lru_cache_add_active_or_unevictable+0x2b/0xa0
nas kernel: [47157.232253] [<ffffffff811b280c>] ?
handle_mm_fault+0xbac/0x17e0
nas kernel: [47157.232311] [<ffffffff811b6a08>] ?
__vma_link_rb+0xc8/0xf0
nas kernel: [47157.232365] [<ffffffff8120ce68>]
do_vfs_ioctl+0x2f8/0x510
nas kernel: [47157.232421] [<ffffffff81066f76>] ?
__do_page_fault+0x1b6/0x450
nas kernel: [47157.232477] [<ffffffff8120d101>] SyS_ioctl+0x81/0xa0
nas kernel: [47157.232527] [<ffffffff81067240>] ?
do_page_fault+0x30/0x80
nas kernel: [47157.232584] [<ffffffff817d8ab2>]
system_call_fastpath+0x16/0x75
nas kernel: [47157.232640] Code: 48 c7 c7 68 e4 37 c0 e8 de 1a d9 c0 e9
55 f0 ff ff 0f 0b be ba 00 00 00 48 c7 c7 68 e4 37 c0 e8 c6 1a d9 c0 e9
4d f1 ff ff 0f 0b <
nas kernel: [47157.232977] RIP [<ffffffffc02e8251>]
__btrfs_run_delayed_refs+0x11a1/0x1230 [btrfs]
nas kernel: [47157.233072] RSP <ffff880103eff7c8>
nas kernel: [47157.256409] ---[ end trace e4064ae1c7878a23 ]---
When it happens, the system is obviously unstable and I can't umount or
reboot (without the sysreq keys, that is).
When I do reboot, the filesystem is still mountable and remotely seems
OK (didn't try a scrub yet). This is reproductible on my side, and I'm
willing do help you debug this!
I can attach the complete dmesg if necessary.
If you need me to try more stuff or dump more information to help
debugging, just ask!
Thanks,
Stéphane.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html