On Thu, Oct 04, 2018 at 11:24:37PM +0200, Hans van Kranenburg wrote: > This patch set contains an additional fix for a newly exposed bug after > the previous attempt to fix a chunk allocator bug for new DUP chunks: > > https://lore.kernel.org/linux-btrfs/782f6000-30c0-0085-abd2-74ec5827c903@xxxxxxxxxx/T/#m609ccb5d32998e8ba5cfa9901c1ab56a38a6f374 > > The DUP fix is "fix more DUP stripe size handling". I did that one > before starting to change more things so it can be applied to earlier > LTS kernels. > > Besides that patch, which is fixing the bug in a way that is least > intrusive, I added a bunch of other patches to help getting the chunk > allocator code in a state that is a bit less error-prone and > bug-attracting. > > When running this and trying the reproduction scenario, I can now see > that the created DUP device extent is 827326464 bytes long, which is > good. > > I wrote and tested this on top of linus 4.19-rc5. I still need to create > a list of related use cases and test more things to at least walk > through a bunch of obvious use cases to see if there's nothing exploding > too quickly with these changes. However, I'm quite confident about it, > so I'm sharing all of it already. > > Any feedback and review is appreciated. Be gentle and keep in mind that > I'm still very much in a learning stage regarding kernel development. The patches look good, thanks. Problem is explained, preparatory work is separated from the fix itself. > The stable patches handling workflow is not 100% clear to me yet. I > guess I have to add a Fixes: in the DUP patch which points to the > previous commit 92e222df7b. Almost nobody does it right, no worries. If you can identify a single patch that introduces a bug then it's for Fixes:, otherwise a CC: stable with version where it makes sense & applies is enough. I do that check myself regardless of what's in the patch. I ran the patches in a VM and hit a division-by-zero in test fstests/btrfs/011, stacktrace below. First guess is that it's caused by patch 3/6. [ 3116.065595] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 1 transid 5 /dev/vdb [ 3116.071274] BTRFS: device fsid e3bd8db5-304f-4b1a-8488-7791ea94088f devid 2 transid 5 /dev/vdc [ 3116.087086] BTRFS info (device vdb): disk space caching is enabled [ 3116.088644] BTRFS info (device vdb): has skinny extents [ 3116.089796] BTRFS info (device vdb): flagging fs with big metadata feature [ 3116.093971] BTRFS info (device vdb): checking UUID tree [ 3125.853755] BTRFS info (device vdb): dev_replace from /dev/vdb (devid 1) to /dev/vdd started [ 3125.860269] divide error: 0000 [#1] PREEMPT SMP [ 3125.861264] CPU: 1 PID: 6477 Comm: btrfs Not tainted 4.19.0-rc7-default+ #288 [ 3125.862841] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 [ 3125.865385] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs] [ 3125.870541] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206 [ 3125.871862] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002 [ 3125.873587] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000 [ 3125.874677] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002 [ 3125.875816] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000 [ 3125.876742] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002 [ 3125.877657] FS: 00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000 [ 3125.878862] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3125.880080] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0 [ 3125.881485] Call Trace: [ 3125.882105] do_chunk_alloc+0x266/0x3e0 [btrfs] [ 3125.882841] btrfs_inc_block_group_ro+0x10e/0x160 [btrfs] [ 3125.883875] scrub_enumerate_chunks+0x18b/0x5d0 [btrfs] [ 3125.884658] ? is_module_address+0x11/0x30 [ 3125.885271] ? wait_for_completion+0x160/0x190 [ 3125.885928] btrfs_scrub_dev+0x1b8/0x5a0 [btrfs] [ 3125.887767] ? start_transaction+0xa1/0x470 [btrfs] [ 3125.888648] btrfs_dev_replace_start.cold.19+0x155/0x17e [btrfs] [ 3125.889459] btrfs_dev_replace_by_ioctl+0x35/0x60 [btrfs] [ 3125.890251] btrfs_ioctl+0x2a94/0x31d0 [btrfs] [ 3125.890885] ? do_sigaction+0x7c/0x210 [ 3125.891731] ? do_vfs_ioctl+0xa2/0x6b0 [ 3125.892652] do_vfs_ioctl+0xa2/0x6b0 [ 3125.893642] ? do_sigaction+0x1a7/0x210 [ 3125.894665] ksys_ioctl+0x3a/0x70 [ 3125.895523] __x64_sys_ioctl+0x16/0x20 [ 3125.896339] do_syscall_64+0x5a/0x1a0 [ 3125.896949] entry_SYSCALL_64_after_hwframe+0x49/0xbe [ 3125.897638] RIP: 0033:0x7f6de28ecaa7 [ 3125.901313] RSP: 002b:00007ffe963da9c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 3125.902486] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f6de28ecaa7 [ 3125.903538] RDX: 00007ffe963dae00 RSI: 00000000ca289435 RDI: 0000000000000003 [ 3125.904878] RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 [ 3125.905788] R10: 0000000000000008 R11: 0000000000000246 R12: 00007ffe963de26f [ 3125.906700] R13: 0000000000000001 R14: 0000000000000004 R15: 000055fceeceb2a0 [ 3125.907954] Modules linked in: btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq loop [ 3125.909342] ---[ end trace 5492bb467d3be2da ]--- [ 3125.910031] RIP: 0010:__btrfs_alloc_chunk+0x368/0xa70 [btrfs] [ 3125.913600] RSP: 0018:ffffa4ea0409fa48 EFLAGS: 00010206 [ 3125.914595] RAX: 0000000004000000 RBX: ffff94e074374508 RCX: 0000000000000002 [ 3125.916209] RDX: 0000000000000000 RSI: ffff94e017818c80 RDI: 0000000002000000 [ 3125.917701] RBP: 0000000080800000 R08: 0000000000000000 R09: 0000000000000002 [ 3125.919209] R10: 0000000300000000 R11: 0000000080900000 R12: 0000000000000000 [ 3125.920782] R13: 0000000000000001 R14: 0000000000000001 R15: 0000000000000002 [ 3125.922413] FS: 00007f6de34208c0(0000) GS:ffff94e07d600000(0000) knlGS:0000000000000000 [ 3125.924264] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 3125.925627] CR2: 00007ffe963d5ce8 CR3: 000000007659b000 CR4: 00000000000006e0
