On Tue, Jul 10, 2012 at 05:27:59AM -0600, Liu Bo wrote:
> While testing with my buffer read fio jobs[1], I find that btrfs does not
> perform well enough.
>
> Here is a scenario in fio jobs:
>
> We have 4 threads, "t1 t2 t3 t4", starting to buffer read a same file,
> and all of them will race on add_to_page_cache_lru(), and if one thread
> successfully puts its page into the page cache, it takes the responsibility
> to read the page's data.
>
> And what's more, reading a page needs a period of time to finish, in which
> other threads can slide in and process rest pages:
>
> t1 t2 t3 t4
> add Page1
> read Page1 add Page2
> | read Page2 add Page3
> | | read Page3 add Page4
> | | | read Page4
> -----|------------|-----------|-----------|--------
> v v v v
> bio bio bio bio
>
> Now we have four bios, each of which holds only one page since we need to
> maintain consecutive pages in bio. Thus, we can end up with far more bios
> than we need.
>
> Here we're going to
> a) delay the real read-page section and
> b) try to put more pages into page cache.
>
> With that said, we can make each bio hold more pages and reduce the number
> of bios we need.
>
> Here is some numbers taken from fio results:
> w/o patch w patch
> ------------- -------- ---------------
> READ: 745MB/s +32% 987MB/s
>
> [1]:
> [global]
> group_reporting
> thread
> numjobs=4
> bs=32k
> rw=read
> ioengine=sync
> directory=/mnt/btrfs/
>
> [READ]
> filename=foobar
> size=2000M
> invalidate=1
>
> Signed-off-by: Liu Bo <liubo2009@xxxxxxxxxxxxxx>
> ---
> fs/btrfs/extent_io.c | 37 +++++++++++++++++++++++++++++++++++--
> 1 files changed, 35 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 01c21b6..8f9c18d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -3549,6 +3549,11 @@ int extent_writepages(struct extent_io_tree *tree,
> return ret;
> }
>
> +struct pagelst {
> + struct page *page;
> + struct list_head lst;
> +};
> +
> int extent_readpages(struct extent_io_tree *tree,
> struct address_space *mapping,
> struct list_head *pages, unsigned nr_pages,
> @@ -3557,19 +3562,47 @@ int extent_readpages(struct extent_io_tree *tree,
> struct bio *bio = NULL;
> unsigned page_idx;
> unsigned long bio_flags = 0;
> + LIST_HEAD(page_pool);
> + struct pagelst *pagelst = NULL;
>
> for (page_idx = 0; page_idx < nr_pages; page_idx++) {
> struct page *page = list_entry(pages->prev, struct page, lru);
>
> prefetchw(&page->flags);
> list_del(&page->lru);
> +
> + if (!pagelst)
> + pagelst = kmalloc(sizeof(*pagelst), GFP_NOFS);
> +
> + if (!pagelst) {
> + page_cache_release(page);
> + continue;
> + }
I'd rather not fail here if we can't make an allocation, since it's just a
optimization, just continue on like we normally would. Thanks,
Josef
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html