Hi David,
On Wed, Aug 03, 2016 at 12:33:01PM -0700, Liu Bo wrote:
> So we can read a btree block via readahead or intentional read,
> and we can end up with a memory leak when something happens as
> follows,
> 1) readahead starts to read block A but does not wait for read
> completion,
> 2) btree_readpage_end_io_hook finds that block A is corrupted,
> and it needs to clear all block A's pages' uptodate bit.
> 3) meanwhile an intentional read kicks in and checks block A's
> pages' uptodate to decide which page needs to be read.
> 4) when some pages have the uptodate bit during 3)'s check so
> 3) doesn't count them for eb->io_pages, but they are later
> cleared by 2) so we has to readpage on the page, we get
> the wrong eb->io_pages which results in a memory leak of
> this block.
>
> This fixes the problem by firstly getting all pages's locking and
> then checking pages' uptodate bit.
t1(readahead) t2(readahead endio) t3(the following read)
read_extent_buffer_pages end_bio_extent_readpage
for pg in eb: for page 0,1,2 in eb:
if pg is uptodate: btree_readpage_end_io_hook(pg)
num_reads++ if uptodate:
eb->io_pages = num_reads SetPageUptodate(pg) _______________
for pg in eb: for page 3 in eb: read_extent_buffer_pages
if pg is NOT uptodate: btree_readpage_end_io_hook(pg) for pg in eb:
__extent_read_full_page(pg) sanity check reports something wrong if pg is uptodate:
clear_extent_buffer_uptodate(eb) num_reads++
for pg in eb: eb->io_pages = num_reads
ClearPageUptodate(page) _______________
for pg in eb:
if pg is NOT uptodate:
__extent_read_full_page(pg)
So t3's eb->io_pages is not consistent with the number of pages it's reading, and during endio(), atomic_dec_and_test(&eb->io_pages) will get a negative number so that we're not able to free the eb.
Thanks,
-liubo
>
> Signed-off-by: Liu Bo <bo.li.liu@xxxxxxxxxx>
> ---
> fs/btrfs/extent_io.c | 9 +++++++++
> 1 file changed, 9 insertions(+)
>
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index bd29b9b..a77050e 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -5215,11 +5215,20 @@ int read_extent_buffer_pages(struct extent_io_tree *tree,
> lock_page(page);
> }
> locked_pages++;
> + }
> + /*
> + * We need to firstly lock all pages to make sure that
> + * the uptodate bit of our pages won't be affected by
> + * clear_extent_buffer_uptodate().
> + */
> + for (i = start_i; i < num_pages; i++) {
> + page = eb->pages[i];
> if (!PageUptodate(page)) {
> num_reads++;
> all_uptodate = 0;
> }
> }
> +
> if (all_uptodate) {
> if (start_i == 0)
> set_bit(EXTENT_BUFFER_UPTODATE, &eb->bflags);
> --
> 2.5.5
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html