On Wed, Feb 05, 2020 at 11:38:27PM +0900, Johannes Thumshirn wrote:
> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
> some drawbacks, like not being able to propagate errors from the lower
> layers.
>
> Directly use the page cache for reading the super-blocks from disk or
> invalidating an on-disk super-block. We have to use the page-cache so to
> avoid races between mkfs and udev. See also 6f60cbd3ae44 ("btrfs: access
> superblock via pagecache in scan_one_device").
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>
> ---
> Changes to v3:
> - Use read_cache_pages() and write_one_page() for IO (hch)
> - Changed subject (David)
> - Dropped Josef's R-b due to change
>
> Changes to v2:
> - open-code kunmap() + put_page() (David)
> - fix double kunmap() (David)
> - don't use bi_set_op_attrs() (David)
>
> Changes to v1:
> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
> ---
> fs/btrfs/disk-io.c | 78 +++++++++++++++++++++++++---------------------
> fs/btrfs/disk-io.h | 4 +--
> fs/btrfs/volumes.c | 57 +++++++++++++++++----------------
> fs/btrfs/volumes.h | 2 --
> 4 files changed, 76 insertions(+), 65 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 28622de9e642..bc14ef1aadda 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -2617,11 +2617,12 @@ int __cold open_ctree(struct super_block *sb,
> u64 features;
> u16 csum_type;
> struct btrfs_key location;
> - struct buffer_head *bh;
> struct btrfs_super_block *disk_super;
> struct btrfs_fs_info *fs_info = btrfs_sb(sb);
> struct btrfs_root *tree_root;
> struct btrfs_root *chunk_root;
> + struct page *super_page;
> + u8 *superblock;
I thought you agree to turn this into a struct btrfs_super_block
pointer?
> bytenr = btrfs_sb_offset(copy_num);
> if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
> return -EINVAL;
>
> - bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
> - /*
> - * If we fail to read from the underlying devices, as of now
> - * the best option we have is to mark it EIO.
> - */
> - if (!bh)
> - return -EIO;
> + gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
> + page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
> + if (IS_ERR_OR_NULL(page))
> + return -ENOMEM;
Why do you need the __GFP_NOFAIL given that failures are handled
properly here? Also I think instead of using mapping_gfp_constraint you
can use GFP_NOFS directly here.
>
> - super = (struct btrfs_super_block *)bh->b_data;
> + super = kmap(page);
> if (btrfs_super_bytenr(super) != bytenr ||
> btrfs_super_magic(super) != BTRFS_MAGIC) {
> - brelse(bh);
> + kunmap(page);
> + put_page(page);
> return -EINVAL;
> }
> + kunmap(page);
Also last time I wondered why we can't leave the page mapped for the
caller and also return the virtual address? That would keep the
callers a little cleaner. Note that you don't need to pass the
struct page in that case as the unmap helper can use kmap_to_page (and
I think a helper would be really nice for the unmap and put anyway).