Re: [PATCH] btrfs: flush write bio if we loop in extent_write_cache_pages

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 23, 2020 at 03:33:02PM -0500, Josef Bacik wrote:
> There exists a deadlock with range_cyclic that has existed forever.  If
> we loop around with a bio already built we could deadlock with a writer
> who has the page locked that we're attempting to write but is waiting on
> a page in our bio to be written out.  The task traces are as follows
> 
> PID: 1329874  TASK: ffff889ebcdf3800  CPU: 33  COMMAND: "kworker/u113:5"
>  #0 [ffffc900297bb658] __schedule at ffffffff81a4c33f
>  #1 [ffffc900297bb6e0] schedule at ffffffff81a4c6e3
>  #2 [ffffc900297bb6f8] io_schedule at ffffffff81a4ca42
>  #3 [ffffc900297bb708] __lock_page at ffffffff811f145b
>  #4 [ffffc900297bb798] __process_pages_contig at ffffffff814bc502
>  #5 [ffffc900297bb8c8] lock_delalloc_pages at ffffffff814bc684
>  #6 [ffffc900297bb900] find_lock_delalloc_range at ffffffff814be9ff
>  #7 [ffffc900297bb9a0] writepage_delalloc at ffffffff814bebd0
>  #8 [ffffc900297bba18] __extent_writepage at ffffffff814bfbf2
>  #9 [ffffc900297bba98] extent_write_cache_pages at ffffffff814bffbd
> 
> PID: 2167901  TASK: ffff889dc6a59c00  CPU: 14  COMMAND:
> "aio-dio-invalid"
>  #0 [ffffc9003b50bb18] __schedule at ffffffff81a4c33f
>  #1 [ffffc9003b50bba0] schedule at ffffffff81a4c6e3
>  #2 [ffffc9003b50bbb8] io_schedule at ffffffff81a4ca42
>  #3 [ffffc9003b50bbc8] wait_on_page_bit at ffffffff811f24d6
>  #4 [ffffc9003b50bc60] prepare_pages at ffffffff814b05a7
>  #5 [ffffc9003b50bcd8] btrfs_buffered_write at ffffffff814b1359
>  #6 [ffffc9003b50bdb0] btrfs_file_write_iter at ffffffff814b5933
>  #7 [ffffc9003b50be38] new_sync_write at ffffffff8128f6a8
>  #8 [ffffc9003b50bec8] vfs_write at ffffffff81292b9d
>  #9 [ffffc9003b50bf00] ksys_pwrite64 at ffffffff81293032
> 
> I used drgn to find the respective pages we were stuck on
> 
> page_entry.page 0xffffea00fbfc7500 index 8148 bit 15 pid 2167901
> page_entry.page 0xffffea00f9bb7400 index 7680 bit 0 pid 1329874
> 
> As you can see the kworker is waiting for bit 0 (PG_locked) on index
> 7680, and aio-dio-invalid is waiting for bit 15 (PG_writeback) on index
> 8148.  aio-dio-invalid has 7680, and the kworker epd looks like the
> following
> 
> crash> struct extent_page_data ffffc900297bbbb0
> struct extent_page_data {
>   bio = 0xffff889f747ed830,
>   tree = 0xffff889eed6ba448,
>   extent_locked = 0,
>   sync_io = 0
> }
> 
> and using drgn I walked the bio pages looking for page
> 0xffffea00fbfc7500 which is the one we're waiting for writeback on
> 
> bio = Object(prog, 'struct bio', address=0xffff889f747ed830)
> for i in range(0, bio.bi_vcnt.value_()):
>     bv = bio.bi_io_vec[i]
>     if bv.bv_page.value_() == 0xffffea00fbfc7500:
>         print("FOUND IT")
> 
> which validated what I suspected.
> 
> The fix for this is simple, flush the epd before we loop back around to
> the beginning of the file during writeout.
> 
> Fixes: b293f02e1423 ("Btrfs: Add writepages support")
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>

Added to misc-next, thanks.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux