On Mon, Oct 22, 2018 at 7:07 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > On Mon, Oct 22, 2018 at 10:09:46AM +0100, fdmanana@xxxxxxxxxx wrote: > > From: Filipe Manana <fdmanana@xxxxxxxx> > > > > When we are writing out a free space cache, during the transaction commit > > phase, we can end up in a deadlock which results in a stack trace like the > > following: > > > > schedule+0x28/0x80 > > btrfs_tree_read_lock+0x8e/0x120 [btrfs] > > ? finish_wait+0x80/0x80 > > btrfs_read_lock_root_node+0x2f/0x40 [btrfs] > > btrfs_search_slot+0xf6/0x9f0 [btrfs] > > ? evict_refill_and_join+0xd0/0xd0 [btrfs] > > ? inode_insert5+0x119/0x190 > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_iget+0x113/0x690 [btrfs] > > __lookup_free_space_inode+0xd8/0x150 [btrfs] > > lookup_free_space_inode+0x5b/0xb0 [btrfs] > > load_free_space_cache+0x7c/0x170 [btrfs] > > ? cache_block_group+0x72/0x3b0 [btrfs] > > cache_block_group+0x1b3/0x3b0 [btrfs] > > ? finish_wait+0x80/0x80 > > find_free_extent+0x799/0x1010 [btrfs] > > btrfs_reserve_extent+0x9b/0x180 [btrfs] > > btrfs_alloc_tree_block+0x1b3/0x4f0 [btrfs] > > __btrfs_cow_block+0x11d/0x500 [btrfs] > > btrfs_cow_block+0xdc/0x180 [btrfs] > > btrfs_search_slot+0x3bd/0x9f0 [btrfs] > > btrfs_lookup_inode+0x3a/0xc0 [btrfs] > > ? kmem_cache_alloc+0x166/0x1d0 > > btrfs_update_inode_item+0x46/0x100 [btrfs] > > cache_save_setup+0xe4/0x3a0 [btrfs] > > btrfs_start_dirty_block_groups+0x1be/0x480 [btrfs] > > btrfs_commit_transaction+0xcb/0x8b0 [btrfs] > > > > At cache_save_setup() we need to update the inode item of a block group's > > cache which is located in the tree root (fs_info->tree_root), which means > > that it may result in COWing a leaf from that tree. If that happens we > > need to find a free metadata extent and while looking for one, if we find > > a block group which was not cached yet we attempt to load its cache by > > calling cache_block_group(). However this function will try to load the > > inode of the free space cache, which requires finding the matching inode > > item in the tree root - if that inode item is located in the same leaf as > > the inode item of the space cache we are updating at cache_save_setup(), > > we end up in a deadlock, since we try to obtain a read lock on the same > > extent buffer that we previously write locked. > > > > So fix this by skipping the loading of free space caches of any block > > groups that are not yet cached (rare cases) if we are updating the inode > > of a free space cache. This is a rare case and its downside is failure to > > find a free extent (return -ENOSPC) when all the already cached block > > groups have no free extents. > > > > Actually isn't this a problem for anything that tries to allocate an extent > while in the tree_root? Like we snapshot or make a subvolume or anything? Indeed. Initially I considered making it more generic (like the recent fix for deadlock when cowing from extent/chunk/device tree) but I totally forgot about the other cases like you mentioned. > We > should just disallow if root == tree_root. But even then we only need to do > this if we're using SPACE_CACHE, using the ye-olde caching or the free space > tree are both ok. Let's just limit it to those cases. Thanks, Yep, makes all sense. Thanks! V2 sent out. > > Josef
