Re: [PATCH 12/13] Make generic_make_request handle arbitrarily large bios

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, May 18, 2012 at 06:05:50PM +1000, NeilBrown wrote:
> 
> Hi Kent,
>  there is lots of good stuff in this series and I certainly hope a lot of it
>  can eventually go upstream.
> 
>  However there is a part of this patch that I don't think is safe:
> 
> 
> > +static void __bio_submit_split(struct closure *cl)
> > +{
> > +	struct bio_split_hook *s = container_of(cl, struct bio_split_hook, cl);
> > +	struct bio *bio = s->bio, *n;
> > +
> > +	do {
> > +		n = bio_split(bio, bio_max_sectors(bio),
> > +			      GFP_NOIO, s->q->bio_split);
> > +		if (!n)
> > +			continue_at(cl, __bio_submit_split, bio_split_wq);
> > +
> > +		closure_get(cl);
> > +		generic_make_request(n);
> > +	} while (n != bio);
> > +
> > +	continue_at(cl, bio_submit_split_done, NULL);
> > +}
> 
> Firstly a small point:  Can bio_split ever return NULL here?  I don't
> think it can, so there is no need to test.
> But if it can, then calling generic_make_request(NULL) doesn't seem like a
> good idea.
> 
> More significantly though::
>   This is called from generic_make_request which can be called recursively and
> enforces a tail-recursion semantic.
> If that generic_make_request was a recursive call, then the
> generic_make_request in __bio_submit_split will not start the request, but
> will queue the bio for later handling.  If we then call bio_split again, we
> could try to allocation from a mempool while we are holding one entry
> allocated from that pool captive.  This can deadlock.
> 
> i.e. if the original bio is so large that it needs to be split into 3 pieces,
> then we will try to allocate the second piece before the first piece has a
> chance to be released.  If this happens in enough threads to exhaust the pool
> (4 I think), it will deadlock.
> 
> I realise this sounds like a very unlikely case, but of course they happen.
> 
> One possible approach might be to count how many splits will be required,
> then have an interface to mempools so you can allocate them all at once,
> or block and wait.

Yeah, I actually thought of that (I probably should've documented it,
though).

That's why the code is checking if bio_split() returned NULL; my verison
of bio_split() checks if current->bio_list is non NULL, and if it is it
doesn't pass __GFP_WAIT to bio_alloc_bioset().

(I was debating whether bio_split() should do that itself or leave it up
to the caller. I tend to think it's too easy to accidentally screw up -
and not notice it - so it should be enforced by generic code. Possibly
the check should be moved to bio_alloc_bioset(), though.)

If we do get a memory allocation failure, then we just punt to workqueue
- continue_at() runs __bio_submit_split from the bio_split workqueue -
where we can safely use the mempool.
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux