Re: Deadlocks due to per-process plugging

Jan Kara <jack@xxxxxxx> writes:

>   Hello,
>   we've recently hit a deadlock in our QA runs which is caused by the
> per-process plugging code. The problem is as follows:
>   process A					process B (kjournald)
>   generic_file_aio_write()
>     blk_start_plug(&plug);
>     ...
>     somewhere in here we allocate memory and
>     direct reclaim submits buffer X for IO
>     ...
>     ext3_write_begin()
>       ext3_journal_start()
>         we need more space in a journal
>         so we want to checkpoint old transactions,
>         we block waiting for kjournald to commit
>         a currently running transaction.
> 						journal_commit_transaction()
> 						  wait for IO on buffer X
> 						  to complete as it is part
> 						  of the current transaction
>   => deadlock since A waits for B and B waits for A to do unplug.
> BTW: I don't think this is really ext3/ext4 specific. I think other
> filesystems can get into problems as well when direct reclaim submits some
> IO and the process subsequently blocks without submitting the IO.

So, I thought schedule would do the flush.  Checking the code:

asmlinkage void __sched schedule(void)
        struct task_struct *tsk = current;


And sched_submit_work looks like this:

static inline void sched_submit_work(struct task_struct *tsk)
        if (!tsk->state || tsk_is_pi_blocked(tsk))
         * If we are going to sleep and we have plugged IO queued,
         * make sure to submit it to avoid deadlocks.
        if (blk_needs_flush_plug(tsk))

This eventually ends in a call to blk_run_queue_async(q) after
submitting the I/O from the plug list.  Right?  So is the question
really why doesn't the kblockd workqueue get scheduled?

