On 2019/9/18 上午8:14, Qu Wenruo wrote:
> [...]
>>
>>> My config results something doesn't make sense at all.
>>>
>>>> [49294.092645] Write of size 8 at addr ffff8885b4901440 by task kworker/u24:6/10894
>>>>
>>>> [49294.092657] CPU: 8 PID: 10894 Comm: kworker/u24:6 Tainted: P OE 5.3.0-rc8-rc-kasan #2
>>>> [49294.092661] Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
>>>> [49294.092726] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
>>>> [49294.092730] Call Trace:
>>>> [49294.092743] dump_stack+0x71/0xa0
>>>> [49294.092751] print_address_description+0x67/0x323
>>>> [49294.092817] ? btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
>>>> [49294.092879] ? btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
>>>> [49294.092884] __kasan_report.cold+0x1a/0x3d
>>>> [49294.092945] ? btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
>>>> [49294.092951] kasan_report+0xe/0x12
>>>> [49294.093012] btrfs_init_reloc_root+0x2bf/0x330 [btrfs]
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> We need code contex of this,
>>
>> That would be +0x2cd on my 5.2.11 build, which according to GDB is here:
>>
>> (gdb) list *(btrfs_init_reloc_root+0x2cd)
>> 0x1dcbdd is in btrfs_init_reloc_root (fs/btrfs/relocation.c:1456).
>> 1451 reloc_root = create_reloc_root(trans, root, root->root_key.objectid);
>> 1452 if (clear_rsv)
>> 1453 trans->block_rsv = rsv;
>> 1454
>> 1455 ret = __add_reloc_root(reloc_root);
>> 1456 BUG_ON(ret < 0);
>> 1457 root->reloc_root = reloc_root;
>> 1458 return 0;
>> 1459 }
>> 1460
>>
>> But according to faddr2line, it is at relocation.c:1438.
>>
>> $ ./scripts/faddr2line fs/btrfs/btrfs.ko btrfs_init_reloc_root+0x2cd
>> btrfs_init_reloc_root+0x2cd/0x340:
>> btrfs_init_reloc_root at /path/to/linux/fs/btrfs/relocation.c:1438
>>
>> That is here:
>>
>> (gdb) list relocation.c:1438
>> 1433 int clear_rsv = 0;
>> 1434 int ret;
>> 1435
>> 1436 if (root->reloc_root) {
>> 1437 reloc_root = root->reloc_root;
>> 1438 reloc_root->last_trans = trans->transid;
This should the correct line.
According to your backtrace, the problem seesms to be caused by some
race, and of course it's caused by my reloc tree lifespan commit.
However my test environment is really bad at catching race related problems.
So would you please try this attached patch to see if it solves the
kasan bug?
Thanks,
Qu
> [...]
From 158a6fb4b5e0143f2058dcc6e17fe5e61245ede4 Mon Sep 17 00:00:00 2001
From: Qu Wenruo <wqu@xxxxxxxx>
Date: Wed, 18 Sep 2019 13:00:25 +0800
Subject: [PATCH] btrfs: relocation: Fix KASAN report about use-after-free due
to dead reloc tree cleanup race
[BUG]
One user reported a reproduciable KASAN report about use-after-free:
BTRFS info (device sdi1): balance: start -dvrange=1256811659264..1256811659265
BTRFS info (device sdi1): relocating block group 1256811659264 flags data|raid0
==================================================================
BUG: KASAN: use-after-free in btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
Write of size 8 at addr ffff88856f671710 by task kworker/u24:10/261579
CPU: 2 PID: 261579 Comm: kworker/u24:10 Tainted: P OE 5.2.11-arch1-1-kasan #4
Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./X99 Extreme4, BIOS P3.80 04/06/2018
Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs]
Call Trace:
dump_stack+0x7b/0xba
print_address_description+0x6c/0x22e
? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
__kasan_report.cold+0x1b/0x3b
? btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
kasan_report+0x12/0x17
__asan_report_store8_noabort+0x17/0x20
btrfs_init_reloc_root+0x2cd/0x340 [btrfs]
record_root_in_trans+0x2a0/0x370 [btrfs]
btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
start_transaction+0x1ab/0xe90 [btrfs]
btrfs_join_transaction+0x1d/0x20 [btrfs]
btrfs_finish_ordered_io+0x7bf/0x18a0 [btrfs]
? lock_repin_lock+0x400/0x400
? __kmem_cache_shutdown.cold+0x140/0x1ad
? btrfs_unlink_subvol+0x9b0/0x9b0 [btrfs]
finish_ordered_fn+0x15/0x20 [btrfs]
normal_work_helper+0x1bd/0xca0 [btrfs]
? process_one_work+0x819/0x1720
? kasan_check_read+0x11/0x20
btrfs_endio_write_helper+0x12/0x20 [btrfs]
process_one_work+0x8c9/0x1720
? pwq_dec_nr_in_flight+0x2f0/0x2f0
? worker_thread+0x1d9/0x1030
worker_thread+0x98/0x1030
kthread+0x2bb/0x3b0
? process_one_work+0x1720/0x1720
? kthread_park+0x120/0x120
ret_from_fork+0x35/0x40
Allocated by task 369692:
__kasan_kmalloc.part.0+0x44/0xc0
__kasan_kmalloc.constprop.0+0xba/0xc0
kasan_kmalloc+0x9/0x10
kmem_cache_alloc_trace+0x138/0x260
btrfs_read_tree_root+0x92/0x360 [btrfs]
btrfs_read_fs_root+0x10/0xb0 [btrfs]
create_reloc_root+0x47d/0xa10 [btrfs]
btrfs_init_reloc_root+0x1e2/0x340 [btrfs]
record_root_in_trans+0x2a0/0x370 [btrfs]
btrfs_record_root_in_trans+0xf4/0x140 [btrfs]
start_transaction+0x1ab/0xe90 [btrfs]
btrfs_start_transaction+0x1e/0x20 [btrfs]
__btrfs_prealloc_file_range+0x1c2/0xa00 [btrfs]
btrfs_prealloc_file_range+0x13/0x20 [btrfs]
prealloc_file_extent_cluster+0x29f/0x570 [btrfs]
relocate_file_extent_cluster+0x193/0xc30 [btrfs]
relocate_data_extent+0x1f8/0x490 [btrfs]
relocate_block_group+0x600/0x1060 [btrfs]
btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
btrfs_relocate_chunk+0x9e/0x180 [btrfs]
btrfs_balance+0x14e4/0x2fc0 [btrfs]
btrfs_ioctl_balance+0x47f/0x640 [btrfs]
btrfs_ioctl+0x119d/0x8380 [btrfs]
do_vfs_ioctl+0x9f5/0x1060
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x73/0xb0
do_syscall_64+0xa5/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Freed by task 369692:
__kasan_slab_free+0x14f/0x210
kasan_slab_free+0xe/0x10
kfree+0xd8/0x270
btrfs_drop_snapshot+0x154c/0x1eb0 [btrfs]
clean_dirty_subvols+0x227/0x340 [btrfs]
relocate_block_group+0x972/0x1060 [btrfs]
btrfs_relocate_block_group+0x3a0/0xa00 [btrfs]
btrfs_relocate_chunk+0x9e/0x180 [btrfs]
btrfs_balance+0x14e4/0x2fc0 [btrfs]
btrfs_ioctl_balance+0x47f/0x640 [btrfs]
btrfs_ioctl+0x119d/0x8380 [btrfs]
do_vfs_ioctl+0x9f5/0x1060
ksys_ioctl+0x67/0x90
__x64_sys_ioctl+0x73/0xb0
do_syscall_64+0xa5/0x370
entry_SYSCALL_64_after_hwframe+0x44/0xa9
The buggy address belongs to the object at ffff88856f671100
which belongs to the cache kmalloc-4k of size 4096
The buggy address is located 1552 bytes inside of
4096-byte region [ffff88856f671100, ffff88856f672100)
The buggy address belongs to the page:
page:ffffea0015bd9c00 refcount:1 mapcount:0 mapping:ffff88864400e600 index:0x0 compound_mapcount: 0
flags: 0x2ffff0000010200(slab|head)
raw: 02ffff0000010200 dead000000000100 dead000000000200 ffff88864400e600
raw: 0000000000000000 0000000000070007 00000001ffffffff 0000000000000000
page dumped because: kasan: bad access detected
Memory state around the buggy address:
ffff88856f671600: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88856f671680: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>ffff88856f671700: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
^
ffff88856f671780: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
ffff88856f671800: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
==================================================================
BTRFS info (device sdi1): 1 enospc errors during balance
BTRFS info (device sdi1): balance: ended with status: -28
[CAUSE]
The problem happens when finish_ordered_io() get called with balance
still running, while the reloc root of that subvolume is already dead.
(tree swap already done, but tree not yet deleted for possible qgroup
usage)
That means root->reloc_root still exists, but that reloc_root can be
under btrfs_drop_snapshot(), thus we shouldn't access it.
The following race could cause the use-after-free problem:
CPU1 | CPU2
--------------------------------------------------------------------------
| relocate_block_group()
| |- unset_reloc_control(rc)
| |- btrfs_commit_transaction()
btrfs_finish_ordered_io() | |- clean_dirty_subvols()
|- btrfs_join_transaction() | |
|- record_root_in_trans() | |
|- btrfs_init_reloc_root() | |
|- if (root->reloc_root) | |
| | |- root->reloc_root = NULL
| | |- btrfs_drop_snapshot(reloc_root);
|- reloc_root->last_trans|
= trans->transid |
^^^^^^^^^^^^^^^^^^^^^^
Use after free
[FIX]
Fix it by the following modifications:
- Test if the root has dead reloc tree before accessing root->reloc_root
If the root has BTRFS_ROOT_DEAD_RELOC_TREE, then we don't need to
create or update root->reloc_tree
- Clear the BTRFS_ROOT_DEAD_RELOC_TREE flag until we have fully dropped
reloc tree
To co-operate with above modification, so as long as
BTRFS_ROOT_DEAD_RELOC_TREE is still set, we won't try to re-create
reloc tree at record_root_in_trans().
Reported-by: Cebtenzzre <cebtenzzre@xxxxxxxxx>
Fixes: d2311e698578 ("btrfs: relocation: Delay reloc tree deletion after merge_reloc_roots")
Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
---
fs/btrfs/relocation.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index 7f219851fa23..d60993a8b2ae 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -1434,16 +1434,19 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
int clear_rsv = 0;
int ret;
- if (root->reloc_root) {
+ if (!test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state) &&
+ root->reloc_root) {
reloc_root = root->reloc_root;
reloc_root->last_trans = trans->transid;
return 0;
}
if (!rc || !rc->create_reloc_tree ||
- root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+ root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID ||
+ test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
return 0;
+
if (!trans->reloc_reserved) {
rsv = trans->block_rsv;
trans->block_rsv = rc->block_rsv;
@@ -2186,7 +2189,6 @@ static int clean_dirty_subvols(struct reloc_control *rc)
/* Merged subvolume, cleanup its reloc root */
struct btrfs_root *reloc_root = root->reloc_root;
- clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
list_del_init(&root->reloc_dirty_list);
root->reloc_root = NULL;
if (reloc_root) {
@@ -2195,6 +2197,7 @@ static int clean_dirty_subvols(struct reloc_control *rc)
if (ret2 < 0 && !ret)
ret = ret2;
}
+ clear_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state);
btrfs_put_fs_root(root);
} else {
/* Orphan reloc tree, just clean it up */
--
2.23.0
Attachment:
signature.asc
Description: OpenPGP digital signature
