On Sun, 2019-09-15 at 14:56 +0800, Qu Wenruo wrote:
>
> On 2019/9/15 上午4:52, Cebtenzzre wrote:
> > I have been able to trigger a use-after-free in btrfs on a stock Arch
> > Linux kernel, versions 5.2.9 and 5.2.11. I also reproduced it on
> > kernel.org mainline 5.3-rc8, resulting in this KASAN report:
> >
> >
> > [49286.511157] BTRFS info (device sdi1): balance: start -dvrange=2221681934336..2221681934337
> > [49286.515651] BTRFS info (device sdi1): relocating block group 2221681934336 flags data|raid0
> > [49294.092536] ==================================================================
> > [49294.092637] BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
>
> It would be much nicer if you could provide the code context by using
> 1) gdb:
> $gdb fs/btrfs/btrfs.ko
> list *(btrfs_init_reloc_root+0x2bf)
>
> 2) faddr2line scripts:
> <in linux kernel source code directory>
> $ ./scripts/faddr2line fs/btrfs/btrfs.kobtrfs_init_reloc_root+0x2bf
Unfortunately, I didn't have debug info on that build of 5.3-rc8. I did
have it enabled (though with CONFIG_DEBUG_INFO_REDUCED) on a build of
5.2.11, which gave me this report:
[10083.021120] BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265
[10083.025073] BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0
[10090.396218] ==================================================================
[10090.396266] BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
[10090.396270] Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579
[10090.396277] CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P OE 5.2.11-arch1-1-kasan #4
[10090.396279] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
[10090.396303] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
[10090.396306] Call Trace:
[10090.396312] dump_stack+0x7b/0xba
[10090.396317] print_address_description+0x6c/0x22e
[10090.396341] ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
[10090.396344] __kasan_report.cold+0x1b/0x3b
[10090.396368] ? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
[10090.396372] kasan_report+0x12/0x17
[10090.396374] __asan_report_store8_noabort+0x17/0x20
[10090.396397] btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
[10090.396418] record_root_in_trans+0x2a0/0x370 [btrfs]
[10090.396439] btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
[10090.396459] start_transaction+0x1ab/0xe90 [btrfs]
[10090.396485] btrfs_join_transaction+0x1d/0x20 [btrfs]
[10090.396522] btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs]
[10090.396528] ? lock_repin_lock+0x400/0x400
[10090.396532] ? __kmem_cache_shutdown.cold+0x140/0x1ad
[10090.396571] ? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs]
[10090.396611] finish_ordered_fn+0x15/0x20 [btrfs]
[10090.396648] normal_work_helper+0x1bd/0xca0 [btrfs]
[10090.396652] ? process_one_work+0x819/0x1720
[10090.396657] ? kasan_check_read+0x11/0x20
[10090.396696] btrfs_endio_write_helper+0x12/0x20 [btrfs]
[10090.396700] process_one_work+0x8c9/0x1720
[10090.396709] ? pwq_dec_nr_in_flight+0x2f0/0x2f0
[10090.396712] ? worker_thread+0x1d9/0x1030
[10090.396723] worker_thread+0x98/0x1030
[10090.396736] kthread+0x2bb/0x3b0
[10090.396739] ? process_one_work+0x1720/0x1720
[10090.396742] ? kthread_park+0x120/0x120
[10090.396749] ret_from_fork+0x35/0x40
[10090.396763] Allocated by task 369692:
[10090.396769] __kasan_kmalloc.part.0+0x44/0xc0
[10090.396772] __kasan_kmalloc.constprop.0+0xba/0xc0
[10090.396775] kasan_kmalloc+0x9/0x10
[10090.396778] kmem_cache_alloc_trace+0x138/0x260
[10090.396812] btrfs_read_tree_root+0x92/0x360 [btrfs]
[10090.396846] btrfs_read_fs_root+0x10/0xb0 [btrfs]
[10090.396882] create_reloc_root+0x47d/0xa10 [btrfs]
[10090.396919] btrfs_init_reloc_root+0x1e2/0x340 [btrfs]
[10090.396952] record_root_in_trans+0x2a0/0x370 [btrfs]
[10090.396985] btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
[10090.397019] start_transaction+0x1ab/0xe90 [btrfs]
[10090.397052] btrfs_start_transaction+0x1e/0x20 [btrfs]
[10090.397086] __btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs]
[10090.397120] btrfs_prealloc_file_range+0x13/0x20 [btrfs]
[10090.397156] prealloc_file_extent_cluster+0x29f/0x570 [btrfs]
[10090.397191] relocate_file_extent_cluster+0x193/0xc30 [btrfs]
[10090.397227] relocate_data_extent+0x1f8/0x490 [btrfs]
[10090.397263] relocate_block_group+0x600/0x1060 [btrfs]
[10090.397298] btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
[10090.397334] btrfs_relocate_chunk+0x9e/0x180 [btrfs]
[10090.397371] btrfs_balance+0x14e4/0x2fc0 [btrfs]
[10090.397407] btrfs_ioctl_balance+0x47f/0x640 [btrfs]
[10090.397443] btrfs_ioctl+0x119d/0x8380 [btrfs]
[10090.397446] do_vfs_ioctl+0x9f5/0x1060
[10090.397449] ksys_ioctl+0x67/0x90
[10090.397452] __x64_sys_ioctl+0x73/0xb0
[10090.397456] do_syscall_64+0xa5/0x370
[10090.397459] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10090.397465] Freed by task 369692:
[10090.397470] __kasan_slab_free+0x14f/0x210
[10090.397472] kasan_slab_free+0xe/0x10
[10090.397475] kfree+0xd8/0x270
[10090.397508] btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs]
[10090.397544] clean_dirty_subvols+0x227/0x340 [btrfs]
[10090.397580] relocate_block_group+0x972/0x1060 [btrfs]
[10090.397616] btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
[10090.397652] btrfs_relocate_chunk+0x9e/0x180 [btrfs]
[10090.397688] btrfs_balance+0x14e4/0x2fc0 [btrfs]
[10090.397724] btrfs_ioctl_balance+0x47f/0x640 [btrfs]
[10090.397760] btrfs_ioctl+0x119d/0x8380 [btrfs]
[10090.397763] do_vfs_ioctl+0x9f5/0x1060
[10090.397766] ksys_ioctl+0x67/0x90
[10090.397769] __x64_sys_ioctl+0x73/0xb0
[10090.397772] do_syscall_64+0xa5/0x370
[10090.397775] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10090.397782] The buggy address belongs to the object at ffff88856f671100
which belongs to the cache kmalloc-4k of size 4096
[10090.397787] The buggy address is located 1552 bytes inside of
4096-byte region [ffff88856f671100, ffff88856f672100)
[10090.397791] The buggy address belongs to the page:
[10090.397797] page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0
[10090.397802] flags: 0x2ffff0000010200(slab|head)
[10090.397808] raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600
[10090.397812] raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
[10090.397814] page dumped because: kasan: bad access detected
[10090.397819] Memory state around the buggy address:
[10090.397824] ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[10090.397829] ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[10090.397833] >ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[10090.397836] ^
[10090.397841] ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[10090.397845] ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[10090.397849] ==================================================================
[10096.840443] BTRFS info (device sdi1): 1 enospc errors during balance
[10096.840447] BTRFS info (device sdi1): balance: ended with status: -28
So, I will be using the offsets from the above report instead of the
report from 5.3-rc8 that you are quoting.
> My config results something doesn't make sense at all.
>
> > [49294.092645] Write of size 8 at addr ffff8885b4901440 by task kworker/u24:6/10894
> >
> > [49294.092657] CPU: 8 PID: 10894 Comm: kworker/u24:6 Tainted: P OE 5.3.0-rc8-rc-kasan #2
> > [49294.092661] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
> > [49294.092726] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
> > [49294.092730] Call Trace:
> > [49294.092743] dump_stack+0x71/0xa0
> > [49294.092751] print_address_description+0x67/0x323
> > [49294.092817] ? btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
> > [49294.092879] ? btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
> > [49294.092884] __kasan_report.cold+0x1a/0x3d
> > [49294.092945] ? btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
> > [49294.092951] kasan_report+0xe/0x12
> > [49294.093012] btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
> We need code contex of this,
That would be +0x2cd on my 5.2.11 build, which according to GDB is here:
(gdb) list *(btrfs_init_reloc_root+0x2cd)
0x1dcbdd is in btrfs_init_reloc_root (fs/btrfs/relocation.c:1456).
1451 reloc_root = create_reloc_root(trans, root, root->root_key.objectid);
1452 if (clear_rsv)
1453 trans->block_rsv = rsv;
1454
1455 ret = __add_reloc_root(reloc_root);
1456 BUG_ON(ret < 0);
1457 root->reloc_root = reloc_root;
1458 return 0;
1459 }
1460
But according to faddr2line, it is at relocation.c:1438.
$ ./scripts/faddr2line fs/btrfs/btrfs.ko btrfs_init_reloc_root+0x2cd
btrfs_init_reloc_root+0x2cd/0x340:
btrfs_init_reloc_root at /path/to/linux/fs/btrfs/relocation.c:1438
That is here:
(gdb) list relocation.c:1438
1433 int clear_rsv = 0;
1434 int ret;
1435
1436 if (root->reloc_root) {
1437 reloc_root = root->reloc_root;
1438 reloc_root->last_trans = trans->transid;
1439 return 0;
1440 }
1441
1442 if (!rc || !rc->create_reloc_tree ||
> > [49294.093066] record_root_in_trans+0x2ba/0x3a0 [btrfs]
> > [49294.093119] btrfs_record_root_in_trans+0xd2/0x150 [btrfs]
> > [49294.093170] start_transaction+0x1c3/0xea0 [btrfs]
> > [49294.093226] btrfs_finish_ordered_io+0x811/0x1610 [btrfs]
> > [49294.093233] ? syscall_return_via_sysret+0xf/0x7f
> > [49294.093238] ? syscall_return_via_sysret+0xf/0x7f
> > [49294.093243] ? __switch_to_asm+0x40/0x70
> > [49294.093248] ? __switch_to_asm+0x34/0x70
> > [49294.093300] ? btrfs_unlink_subvol+0xa30/0xa30 [btrfs]
> > [49294.093307] ? finish_task_switch+0x1a1/0x760
> > [49294.093312] ? __switch_to+0x457/0xe90
> > [49294.093317] ? __switch_to_asm+0x34/0x70
> > [49294.093378] normal_work_helper+0x15a/0xb90 [btrfs]
> > [49294.093387] process_one_work+0x706/0x1200
> > [49294.093394] worker_thread+0x92/0xfb0
> > [49294.093401] ? __kthread_parkme+0x85/0x100
> > [49294.093406] ? process_one_work+0x1200/0x1200
> > [49294.093410] kthread+0x2ba/0x3b0
> > [49294.093414] ? kthread_park+0x130/0x130
> > [49294.093420] ret_from_fork+0x35/0x40
> >
> > [49294.093431] Allocated by task 11689:
> > [49294.093441] __kasan_kmalloc.part.0+0x3c/0xa0
> > [49294.093493] btrfs_read_tree_root+0x8f/0x350 [btrfs]
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> This,
GDB points here:
(gdb) list *(btrfs_read_tree_root+0x92)
0xb07c2 is in btrfs_read_tree_root (./include/linux/slab.h:742).
737 * @size: how many bytes of memory are required.
738 * @flags: the type of memory to allocate (see kmalloc).
739 */
740 static inline void *kzalloc(size_t size, gfp_t flags)
741 {
742 return kmalloc(size, flags | __GFP_ZERO);
743 }
744
745 /**
746 * kzalloc_node - allocate zeroed memory from a particular memory node.
Whereas faddr2line points at slab.h:547, here:
(gdb) list slab.h:547
542 index = kmalloc_index(size);
543
544 if (!index)
545 return ZERO_SIZE_PTR;
546
547 return kmem_cache_alloc_trace(
548 kmalloc_caches[kmalloc_type(flags)][index],
549 flags, size);
550 #endif
551 }
btrfs_alloc_root calls kzalloc, so it looks like this is the inlined
result of that. It is called here:
(gdb) list disk-io.c:1434
1429
1430 path = btrfs_alloc_path();
1431 if (!path)
1432 return ERR_PTR(-ENOMEM);
1433
1434 root = btrfs_alloc_root(fs_info, GFP_NOFS);
1435 if (!root) {
1436 ret = -ENOMEM;
1437 goto alloc_fail;
1438 }
> > [49294.093542] btrfs_read_fs_root+0xc/0xc0 [btrfs]
> > [49294.093601] create_reloc_root+0x445/0x920 [btrfs]
> > [49294.093660] btrfs_init_reloc_root+0x1da/0x330 [btrfs]
> > [49294.093709] record_root_in_trans+0x2ba/0x3a0 [btrfs]
> > [49294.093758] btrfs_record_root_in_trans+0xd2/0x150 [btrfs]
> > [49294.093806] start_transaction+0x1c3/0xea0 [btrfs]
> > [49294.093856] __btrfs_prealloc_file_range+0x1c2/0xa50 [btrfs]
> > [49294.093907] btrfs_prealloc_file_range+0x10/0x20 [btrfs]
> > [49294.093966] prealloc_file_extent_cluster+0x24e/0x4a0 [btrfs]
> > [49294.094025] relocate_file_extent_cluster+0x193/0xc90 [btrfs]
> > [49294.094083] relocate_data_extent+0x1f2/0x460 [btrfs]
> > [49294.094142] relocate_block_group+0x5a5/0xf50 [btrfs]
> > [49294.094200] btrfs_relocate_block_group+0x38f/0x990 [btrfs]
> > [49294.094258] btrfs_relocate_chunk+0x5c/0x100 [btrfs]
> > [49294.094315] btrfs_balance+0x1292/0x2f00 [btrfs]
> > [49294.094373] btrfs_ioctl_balance+0x4c2/0x6a0 [btrfs]
> > [49294.094430] btrfs_ioctl+0xe56/0x82d0 [btrfs]
> > [49294.094434] do_vfs_ioctl+0x99f/0xf10
> > [49294.094437] ksys_ioctl+0x5e/0x90
> > [49294.094440] __x64_sys_ioctl+0x6f/0xb0
> > [49294.094446] do_syscall_64+0xa0/0x370
> > [49294.094451] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [49294.094457] Freed by task 11689:
> > [49294.094464] __kasan_slab_free+0x144/0x1f0
> > [49294.094468] kfree+0x95/0x2a0
> > [49294.094516] btrfs_drop_snapshot+0x1529/0x1c40 [btrfs]
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> And this.
(gdb) list *(btrfs_drop_snapshot+0x154c)
0x7643c is in btrfs_drop_snapshot (fs/btrfs/disk-io.h:110).
105 }
106
107 static inline void btrfs_put_fs_root(struct btrfs_root *root)
108 {
109 if (refcount_dec_and_test(&root->refs))
110 kfree(root);
111 }
112
113 void btrfs_mark_buffer_dirty(struct extent_buffer *buf);
114 int btrfs_buffer_uptodate(struct extent_buffer *buf, u64 parent_transid,
faddr2line agrees with GDB on this one.
There is one direct call to btrfs_put_fs_root in btrfs_drop_snapshot,
here:
(gdb) list extent-tree.c:9446
9441 if (test_bit(BTRFS_ROOT_IN_RADIX, &root->state)) {
9442 btrfs_add_dropped_root(trans, root);
9443 } else {
9444 free_extent_buffer(root->node);
9445 free_extent_buffer(root->commit_root);
9446 btrfs_put_fs_root(root);
9447 }
9448 root_dropped = true;
9449 out_end_trans:
9450 btrfs_end_transaction_throttle(trans);
> > [49294.094573] clean_dirty_subvols+0x23f/0x380 [btrfs]
> > [49294.094632] relocate_block_group+0x95b/0xf50 [btrfs]
> > [49294.094691] btrfs_relocate_block_group+0x38f/0x990 [btrfs]
> > [49294.094748] btrfs_relocate_chunk+0x5c/0x100 [btrfs]
> > [49294.094805] btrfs_balance+0x1292/0x2f00 [btrfs]
> > [49294.094863] btrfs_ioctl_balance+0x4c2/0x6a0 [btrfs]
> > [49294.094920] btrfs_ioctl+0xe56/0x82d0 [btrfs]
> > [49294.094923] do_vfs_ioctl+0x99f/0xf10
> > [49294.094926] ksys_ioctl+0x5e/0x90
> > [49294.094929] __x64_sys_ioctl+0x6f/0xb0
> > [49294.094934] do_syscall_64+0xa0/0x370
> > [49294.094939] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > [49294.094946] The buggy address belongs to the object at ffff8885b4901100
> > which belongs to the cache kmalloc-2k of size 2048
> > [49294.094953] The buggy address is located 832 bytes inside of
> > 2048-byte region [ffff8885b4901100, ffff8885b4901900)
> > [49294.094957] The buggy address belongs to the page:
> > [49294.094962] page:ffffea0016d24000 refcount:1 mapcount:0 mapping:ffff88864400e800 index:0x0 compound_mapcount: 0
> > [49294.094968] flags: 0x2ffff0000010200(slab|head)
> > [49294.094976] raw: 02ffff0000010200 dead000000000100 dead000000000122 ffff88864400e800
> > [49294.094981] raw: 0000000000000000 00000000800f000f 00000001ffffffff 0000000000000000
> > [49294.094983] page dumped because: kasan: bad access detected
> >
> > [49294.094987] Memory state around the buggy address:
> > [49294.094992] ffff8885b4901300: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [49294.094997] ffff8885b4901380: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [49294.095002] >ffff8885b4901400: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [49294.095006] ^
> > [49294.095010] ffff8885b4901480: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [49294.095015] ffff8885b4901500: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
> > [49294.095018] ==================================================================
> > [49301.893672] BTRFS info (device sdi1): 1 enospc errors during balance
> > [49301.893675] BTRFS info (device sdi1): balance: ended with status: -28
> >
> >
> > Without KASAN, I would eventually get an oops like this:
> >
> > [...]
> >
> > I only noticed this bug because I keep an eye on dmesg. In one instance,
> > I ignored it for a few hours, then my graphics driver crashed because of
> > memory allocation failure and/or heap corruption. Aside from that, I
> > have seen no obvious effects.
> >
> > I have hit this bug at least 5 times over the last two weeks. Every
> > time, it has been caused by a balance on various volumes (typically to
> > balance a single block group). I was able to trigger it somewhat
> > reliably by attempting a balance on a volume with a size of 596.18GiB
> > and 1.68GiB of estimated free space, but that stopped working
> > eventually.
>
> Is it always that particular fs?
> Have you ever hit it on another fs?
> Furthermore, did you hit it in v5.1?
I usually hit this on a raid0 data volume, but I got the same KASAN
report once while attemtping to balance my raid1 system volume.
I got the data volume into a state where it would consistently trigger
the bug (2.05GiB of free space), and did a few git bisects.
On v5.0, the balance correctly fails with ENOSPC.
As of commit d2311e69857815ae2f728b48e6730f833a617092 ("btrfs:
relocation: Delay reloc tree deletion after merge_reloc_roots"), first
appearing in v5.1-rc1, I get "kernel BUG at fs/btrfs/relocation.c:1413!"
at create_reloc_root+0x6a1/0x920 when attempting a balance.
As of commit 30d40577e322b670551ad7e2faa9570b6e23eb2b ("btrfs: reloc:
Also queue orphan reloc tree for cleanup to avoid BUG_ON()"), first
appearing in v5.2-rc3, I get the KASAN report instead of the BUG_ON.
> I guess there is a chance my previous change of reloc tree lifespan
> screwed up something, but it's introduced in v5.1...
>
> Thanks,
> Qu
>
--
Cebtenzzre <cebtenzzre@xxxxxxxxx>