On Thu, May 24, 2012 at 01:34:02PM +1000, NeilBrown wrote:
> On Wed, 23 May 2012 15:26:21 +0800 Shaohua Li <shli@xxxxxxxxxx> wrote:
>
> > In raid1, all write requests are dispatched in raid1d thread. In fast storage,
> > the raid1d thread is a bottleneck, because it dispatches request too slow. Also
> > raid1d thread migrates freely, which makes request completion cpu not match
> > with submission cpu even driver/block layer has such capability. This will
> > cause bad cache issue.
> >
> > If bitmap support is enabled, write requests can only be dispatched after dirty
> > bitmap is flushed out. After bitmap is flushed, how write requests are
> > dispatched doesn't impact correctness. A natural idea is to distribute request
> > dispatch to several threads. With this patch, requests are added to a percpu
> > list first. After bitmap is flushed, then the percpu list requests will
> > dispatched in a workqueue. In this way, above bottleneck is removed.
> >
> > In a 4k randwrite test with a 2 disks setup, below patch can provide 10% ~ 50%
> > performance improvements depending on numa binding.
>
> Those numbers are quite impressive so there is certainly room for improvement
> here. I'm not sure that I'm entirely comfortable with the approach though.
>
> Passing the request to a per-cpu thread does make sense, but a generic
> per-cpu thread feels dangerous as we don't know what else might be queued to
> that thread and because of the potential for deadlocks between memory
> allocation and generic_make_request (as mentioned in previous email) I find
> it hard to convince myself that this approach is entirely safe.
No problem, we can use a separate workqueue.
> I wonder if we might take a very different approach and try to do everything
> in the one process. i.e. don't hand tasks off to other threads at all - at
> least in the common case.
> So we could change plugger_unplug (in md.c) so that:
>
> - if current->bio_list is NULL, (meaning all requests have been submitted
> and there is no risk of deadlock) we call bitmap_unplug and then submit
> all the queued writes.
> - if current->bio_list is not NULL, then we just wakeup the md thread to
> do the work.
The current->bio_list check does make sense. I'm going to do the check for the
1 and 3 patches.
But I believe we can't call generic_make_request in unplug. For example,
schedule()->unplug->generic_make_request->get_request_wait()->schedule(). At
least this will cause some request not dispatch. And last time I did similar
experiment for raid0 (request is added to a per-disk plug_cb list, not directly
dispatch) to reduce lock contention, I found nasty oops.
Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[ATA RAID]
[Linux SCSI Target Infrastructure]
[Managing RAID on Linux]
[Linux IDE]
[Linux SCSI]
[Linux Hams]
[Device-Mapper]
[Kernel]
[Linux Books]
[Linux Admin]
[Linux Net]
[GFS]
[RPM]
[git]
[Photos]
[Yosemite Photos]
[Yosemite News]
[AMD 64]
[Linux Networking]