On Tue, Jan 21, 2020 at 02:34:52PM -0500, Josef Bacik wrote:
> Thankfully the stars have to align just right to hit this. First you
> have to end up in the fixup worker, which is tricky by itself (my
> reproducer does DIO reads into a MMAP'ed region, so not a common
> operation). Then you have to have less than a page size of free data
> space and 0 unallocated space so you go down the "commit the transaction
> to free up pinned space" path. This was accomplished by a random
> balance that was running on the host. Then you get this deadlock.
>
> I'm still in the process of trying to force the deadlock to happen on
> demand, but I've hit other issues. I can still trigger the fixup worker
> path itself so this patch has been tested in that regard, so the normal
> case is fine.
>
> Fixes: 87826df0ec36 ("btrfs: delalloc for page dirtied out-of-band in fixup worker")
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
> ---
> v2->v3:
> - Use a delayed iput in the fixup worker. I *think* we can deadlock if we do
> the final iput and need to flush space, which may trigger the fixup worker
> which is busy doing our iput. Err on the side of caution and use a delayed
> iput.
Sounds serious so I've turned this into a comment.