On 19 Dec 2018, at 5:33, ethanlien wrote: > Martin Raiber 於 2018-12-17 22:00 寫到: > >>>>> >>>> I had lockups with this patch as well. If you put e.g. a loop >>>> device on >>>> top of a btrfs file, loop sets PF_LESS_THROTTLE to avoid a feed >>>> back >>>> loop causing delays. The task balancing dirty pages in >>>> btrfs_finish_ordered_io doesn't have the flag and causes >>>> slow-downs. In >>>> my case it managed to cause a feedback loop where it queues other >>>> btrfs_finish_ordered_io and gets stuck completely. >>>> >>> >>> The data writepage endio will queue a work for >>> btrfs_finish_ordered_io() in a separate workqueue and clear page's >>> writeback, so throttling in btrfs_finish_ordered_io() should not >>> slow >>> down flusher thread. One suspicious point is while the caller is >>> waiting a range of ordered_extents to complete, they will be >>> blocked until balance_dirty_pages_ratelimited() make some >>> progress, since we finish ordered_extents in >>> btrfs_finish_ordered_io(). >>> Do you have call stack information for stuck processes or using >>> fsync/sync frequently? If this is the case, maybe we should pull >>> this thing out and try balance dirty metadata pages somewhere. >> >> Yeah like, >> >> [875317.071433] Call Trace: >> [875317.071438] ? __schedule+0x306/0x7f0 >> [875317.071442] schedule+0x32/0x80 >> [875317.071447] btrfs_start_ordered_extent+0xed/0x120 >> [875317.071450] ? remove_wait_queue+0x60/0x60 >> [875317.071454] btrfs_wait_ordered_range+0xa0/0x100 >> [875317.071457] btrfs_sync_file+0x1d6/0x400 >> [875317.071461] ? do_fsync+0x38/0x60 >> [875317.071463] ? btrfs_fdatawrite_range+0x50/0x50 >> [875317.071465] do_fsync+0x38/0x60 >> [875317.071468] __x64_sys_fsync+0x10/0x20 >> [875317.071470] do_syscall_64+0x55/0x100 >> [875317.071473] entry_SYSCALL_64_after_hwframe+0x44/0xa9 >> >> so I guess the problem is that the calling balance_dirty_pages causes >> fsyncs to the same btrfs (via my unusual setup of loop+fuse)? Those >> fsyncs are deadlocked because they are called indirectly from >> btrfs_finish_ordered_io... It is a unusal setup, which is why I did >> not >> post it to the mailing list initially. > > To me this is not like a real deadlock. The fsync call invokes two > steps: > (1) flushing dirty data pages, (2) update corresponding metadata to > point to those flushed data. Since step1 consume dirty pages and > step2 produce more dirty pages, in this patch we leave step1 > unchanged and block step2 in btrfs_finish_ordered_io(), which > seems reasonable to a OOM fix. The problem is, if there are > other processes continually writing new data, the fsync call will > need to wait the metadata update for a long time, even its dirty > data has been flushed long time ago. > > Back to the deadlock problem, what Chris found is really a deadlock, > and it can be fixed by adding a check of free space inode. I think we should have a better understanding of your original OOM problem before we keep the balance_dirty_pages(). This isn't a great place to throttle, and while it's also not a great place to make a huge burst of dirty pages, I'd like to make sure we're really fixing the right problem against today's kernel. -chris
