On 7.02.20 г. 9:20 ч., Johannes Thumshirn wrote:
> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
> some drawbacks, like not being able to propagate errors from the lower
> layers.
>
> Directly use the page cache for reading the super-blocks from disk or
> invalidating an on-disk super-block. We have to use the page-cache so to
> avoid races between mkfs and udev. See also 6f60cbd3ae44 ("btrfs: access
> superblock via pagecache in scan_one_device").
>
> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>
> ---
> Changes to v4:
> - Remove mapping_gfp_constraint() and GFP_NOFAIL (hch)
>
> Changes to v3:
> - Use read_cache_pages() and write_one_page() for IO (hch)
> - Changed subject (David)
> - Dropped Josef's R-b due to change
>
> Changes to v2:
> - open-code kunmap() + put_page() (David)
> - fix double kunmap() (David)
> - don't use bi_set_op_attrs() (David)
>
> Changes to v1:
> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
> ---
> fs/btrfs/disk-io.c | 76 +++++++++++++++++++++++++---------------------
> fs/btrfs/disk-io.h | 4 +--
> fs/btrfs/volumes.c | 57 ++++++++++++++++++----------------
> fs/btrfs/volumes.h | 2 --
> 4 files changed, 74 insertions(+), 65 deletions(-)
>
<snip>
>
> @@ -3355,40 +3363,38 @@ static void btrfs_end_buffer_write_sync(struct buffer_head *bh, int uptodate)
> }
>
> int btrfs_read_dev_one_super(struct block_device *bdev, int copy_num,
> - struct buffer_head **bh_ret)
> + struct page **super_page)
> {
> - struct buffer_head *bh;
> struct btrfs_super_block *super;
> + struct page *page;
> u64 bytenr;
> + struct address_space *mapping = bdev->bd_inode->i_mapping;
>
> bytenr = btrfs_sb_offset(copy_num);
> if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
> return -EINVAL;
You don't use page_offset(bytenr) here but you do it in
btrfs_scratch_superblocks. I'm aware that this could be omitted entirely
since the sb is always aligned on a 4k. But in this case either you omit
it everywhere or you use it everywhere for the sake of consistency.
>
> - bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
> - /*
> - * If we fail to read from the underlying devices, as of now
> - * the best option we have is to mark it EIO.
> - */
> - if (!bh)
> - return -EIO;
> + page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, GFP_NOFS);
> + if (IS_ERR_OR_NULL(page))
> + return -ENOMEM;
read_cache_page can return an error from ->readpage as well. Also
looking at do_read_cache_page it doesn't seem like it can return a NULL
pointer.
>
> - super = (struct btrfs_super_block *)bh->b_data;
> + super = kmap(page);
> if (btrfs_super_bytenr(super) != bytenr ||
> btrfs_super_magic(super) != BTRFS_MAGIC) {
> - brelse(bh);
> + kunmap(page);
> + put_page(page);
> return -EINVAL;
> }
> + kunmap(page);
>
> - *bh_ret = bh;
> + *super_page = page;
> return 0;
> }
>
>
> -struct buffer_head *btrfs_read_dev_super(struct block_device *bdev)
> +int btrfs_read_dev_super(struct block_device *bdev, struct page **page)
> {
> - struct buffer_head *bh;
> - struct buffer_head *latest = NULL;
> + struct page *latest = NULL;
> struct btrfs_super_block *super;
> int i;
> u64 transid = 0;
<snip>