On Wed, Sep 19, 2018 at 02:59:58PM +0800, Qu Wenruo wrote: > In the following case, we could trigger a use-after-free bug: > > CPU0 | CPU1 > ------------------------------------------------------------------------- > btrfs_remove_delayed_node | btrfs_get_delayed_node > |- delayed_node = | |- node = btrfs_inode->delayed_node; > | btrfs_inode->delayed_node | | > |- btrfs_release_delaedy_node() | | > |- ref_count_dev_and_test() | | > |- kmem_cache_free() | | > Now delayed node is freed | | > | |- refcount_inc(&node->refs) > > In that case sine delayed_node is using kmem cache, such use-after-free > bug won't directly cause problem, but could leads to corrupted data > structure of other kmem cache user. > > Fix it by adding btrfs_inode::delayed_node_lock to protect such > operation. > > Reported-by: sunny.s.zhang <sunny.s.zhang@xxxxxxxxxx> > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > --- > Please don't merge this patch yet. > > The patch caused random slow down for a lot of quick test cases. > Old tests can be executed in 1s or so now randomly needs near 20s. > > It looks like the spin_lock() with root->inode_lock hold is causing the > problem but I can't see what's going wrong. > As the operation done with @delayed_node_lock hold is literatly tiny. > > Any comment on this is welcomed. I found the original report and discussion, so the resume is that it's possibly caused by the refcount_t rework and the missing fix ec35e48b2869599 . As in time of evict there can be no active operation running and the crash happened inside atime update. I take the bug in refcounting as a plausible explanation so your patch does not seem to be necessary, unless there's a reproducer on a recent kernel.
