On Wed, Jul 19, 2017 at 11:25:51PM -0400, Jeff Mahoney wrote: > If we have a block group that is all of the following: > 1) uncached in memory > 2) is read-only > 3) has a disk cache state that indicates we need to recreate the cache > > AND the file system has enough free space fragmentation such that the > request for an extent of a given size can't be honored; > > AND have a single CPU core; > > AND it's the block group with the highest starting offset such that > there are no opportunities (like reading from disk) for the loop to > yield the CPU; > > We can end up with a lockup. What a nice corner case. Thanks for tracking it down. > The root cause is simple. Once we're in the position that we've read in > all of the other block groups directly and none of those block groups > can honor the request, there are no more opportunities to sleep. We end > up trying to start a caching thread which never gets run if we only have > one core. This *should* present as a hung task waiting on the caching > thread to make some progress, but it doesn't. Instead, it degrades into > a busy loop because of the placement of the read-only check. > > During the first pass through the loop, block_group->cached will be set > to BTRFS_CACHE_STARTED and have_caching_bg will be set. Then we hit the > read-only check and short circuit the loop. We're not yet in > LOOP_CACHING_WAIT, so we skip that loop back before going through the > loop again for other raid groups. > > Then we move to LOOP_CACHING_WAIT state. > > During the this pass through the loop, ->cached will still be > BTRFS_CACHE_STARTED, which means it's not cached, so we'll enter > cache_block_group, do a lot of nothing, and return, and also set > have_caching_bg again. Then we hit the read-only check and short circuit > the loop. The same thing happens as before except now we DO trigger > the LOOP_CACHING_WAIT && have_caching_bg check and loop back up to the > top. We do this forever. > > There are two fixes in this patch since they address the same underlying > bug. > > The first is to add a cond_resched to the end of the loop to ensure > that the caching thread always has an opportunity to run. This will > fix the soft lockup issue, but find_free_extent will still loop doing > nothing until the thread has completed. I've looked if there are similar block group traversal patterns that could use cond_resched but did not find any. Mostly simple operations, nothing complex as the one in find_free_extent. > The second is to move the read-only check to the top of the loop. We're > never going to return an allocation within a read-only block group so > we may as well skip it early. The check for ->cached == BTRFS_CACHE_ERROR > would cause the same problem except that BTRFS_CACHE_ERROR is considered > a "done" state and we won't re-set have_caching_bg again. > > Many thanks to Stephan Kulow <coolo@xxxxxxx> for his excellent help in > the testing process. > > Signed-off-by: Jeff Mahoney <jeffm@xxxxxxxx> Regarding the change itself, Reviewed-by: David Sterba <dsterba@xxxxxxxx> but not much chance I could see all the consequences without the detailed changelog. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
