On 04/24/2012 01:33 AM, Josef Bacik wrote:
> We can deadlock waiting for pages to end writeback because we are doing an
> allocation while hold a tree lock since the ordered extent stuff will
> require tree locks. A quick easy way to fix this is to end page writeback
> before we do our ordered io stuff, which works fine since we don't really
> need the page for this to work. Eventually we want to make this work happen
> as soon as the io is completed and then push the ordered extent stuff off to
> a worker thread, but at this stage we need this deadlock fixed with changing
> as little as possible. Thanks,
>
Hi Josef,
I'm ok with the patch, but could you show us more details about the deadlock between allocation and endio?
thanks,
liubo
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxx>
> ---
> fs/btrfs/compression.c | 3 +--
> fs/btrfs/extent_io.c | 27 +++++++++++----------------
> fs/btrfs/ordered-data.c | 4 +++-
> 3 files changed, 15 insertions(+), 19 deletions(-)
>
> diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c
> index d11afa67..6a456d4 100644
> --- a/fs/btrfs/compression.c
> +++ b/fs/btrfs/compression.c
> @@ -286,13 +286,12 @@ static void end_compressed_bio_write(struct bio *bio, int err)
> inode = cb->inode;
> tree = &BTRFS_I(inode)->io_tree;
> cb->compressed_pages[0]->mapping = cb->inode->i_mapping;
> + end_compressed_writeback(inode, cb->start, cb->len);
> tree->ops->writepage_end_io_hook(cb->compressed_pages[0],
> cb->start,
> cb->start + cb->len - 1,
> NULL, 1);
> cb->compressed_pages[0]->mapping = NULL;
> -
> - end_compressed_writeback(inode, cb->start, cb->len);
> /* note, our inode could be gone now */
>
> /*
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 7c501d3..308c1c2 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -1806,16 +1806,6 @@ static void check_page_locked(struct extent_io_tree *tree, struct page *page)
> }
>
> /*
> - * helper function to end page writeback if all the extents
> - * in the tree for that page are done with writeback
> - */
> -static void check_page_writeback(struct extent_io_tree *tree,
> - struct page *page)
> -{
> - end_page_writeback(page);
> -}
> -
> -/*
> * When IO fails, either with EIO or csum verification fails, we
> * try other mirrors that might have a good copy of the data. This
> * io_failure_record is used to record state as we go through all the
> @@ -2206,6 +2196,13 @@ int end_extent_writepage(struct page *page, int err, u64 start, u64 end)
> struct extent_io_tree *tree;
> int ret;
>
> + /*
> + * Don't want the page disappearing out from under us after we clear
> + * writeback.
> + */
> + page_cache_get(page);
> + end_page_writeback(page);
> +
> tree = &BTRFS_I(page->mapping->host)->io_tree;
>
> if (tree->ops && tree->ops->writepage_end_io_hook) {
> @@ -2220,8 +2217,10 @@ int end_extent_writepage(struct page *page, int err, u64 start, u64 end)
> ret = tree->ops->writepage_io_failed_hook(NULL, page,
> start, end, NULL);
> /* Writeback already completed */
> - if (ret == 0)
> + if (ret == 0) {
> + page_cache_release(page);
> return 1;
> + }
> }
>
> if (!uptodate) {
> @@ -2229,6 +2228,7 @@ int end_extent_writepage(struct page *page, int err, u64 start, u64 end)
> ClearPageUptodate(page);
> SetPageError(page);
> }
> + page_cache_release(page);
> return 0;
> }
>
> @@ -2267,11 +2267,6 @@ static void end_bio_extent_writepage(struct bio *bio, int err)
>
> if (end_extent_writepage(page, err, start, end))
> continue;
> -
> - if (whole_page)
> - end_page_writeback(page);
> - else
> - check_page_writeback(tree, page);
> } while (bvec >= bio->bi_io_vec);
>
> bio_put(bio);
> diff --git a/fs/btrfs/ordered-data.c b/fs/btrfs/ordered-data.c
> index bbf6d0d..acfb360 100644
> --- a/fs/btrfs/ordered-data.c
> +++ b/fs/btrfs/ordered-data.c
> @@ -196,7 +196,7 @@ static int __btrfs_add_ordered_extent(struct inode *inode, u64 file_offset,
> entry->len = len;
> entry->disk_len = disk_len;
> entry->bytes_left = len;
> - entry->inode = inode;
> + entry->inode = igrab(inode);
> entry->compress_type = compress_type;
> if (type != BTRFS_ORDERED_IO_DONE && type != BTRFS_ORDERED_COMPLETE)
> set_bit(type, &entry->flags);
> @@ -399,6 +399,8 @@ void btrfs_put_ordered_extent(struct btrfs_ordered_extent *entry)
> trace_btrfs_ordered_extent_put(entry->inode, entry);
>
> if (atomic_dec_and_test(&entry->refs)) {
> + if (entry->inode)
> + btrfs_add_delayed_iput(entry->inode);
> while (!list_empty(&entry->list)) {
> cur = entry->list.next;
> sum = list_entry(cur, struct btrfs_ordered_sum, list);
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html