On Wed, Dec 18, 2019 at 11:38:18AM -0500, Josef Bacik wrote: > On 12/15/19 12:12 PM, Aditya Pakki wrote: > > alloc_extent_state_atomic() allocates extents via GFP_ATOMIC flag > > and cannot fail. There are multiple invocations of BUG_ON on the > > return value to check for failure. The patch replaces certain > > invocations of BUG_ON by returning the error upstream. > > > > Signed-off-by: Aditya Pakki <pakki001@xxxxxxx> > > I already tried this a few months ago and gave up. There are a few things if > you want to tackle something like this > > 1) use bpf's error injection thing to make sure you handle every path that can > error out. This is the script I wrote to do just that > > https://github.com/josefbacik/debug-scripts/blob/master/error-injection-stress.py > > 2) We actually can't fail here. We would need to go back and make _all_ callers > of lock_extent_bits() handle the allocation error. This is theoretically > possible, but a giant pain in the ass. In general we can make allocations here > and we need to be able to make them. > > 3) We should probably mark this path with __GFP_NOFAIL because again, this is > locking and we need locking to succeed. NOFAIL can introduce loops that could lead to deadlocks, if not used carefully. __set_extent_bit is not just locking, so if one thread wants to set bits, allocate, wait, allocator goes to write some memory eg. set_extent_bit on some range alloc state (NOFAIL) allocator wants to flush dome dirty data ------------------------------> set_extent_bit alloc state (NOFAIL) (wait)
