On 05/02/2020 17:53, Christoph Hellwig wrote:
> On Wed, Feb 05, 2020 at 11:38:27PM +0900, Johannes Thumshirn wrote:
>> Super-block reading in BTRFS is done using buffer_heads. Buffer_heads have
>> some drawbacks, like not being able to propagate errors from the lower
>> layers.
>>
>> Directly use the page cache for reading the super-blocks from disk or
>> invalidating an on-disk super-block. We have to use the page-cache so to
>> avoid races between mkfs and udev. See also 6f60cbd3ae44 ("btrfs: access
>> superblock via pagecache in scan_one_device").
>>
>> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@xxxxxxx>
>>
>> ---
>> Changes to v3:
>> - Use read_cache_pages() and write_one_page() for IO (hch)
>> - Changed subject (David)
>> - Dropped Josef's R-b due to change
>>
>> Changes to v2:
>> - open-code kunmap() + put_page() (David)
>> - fix double kunmap() (David)
>> - don't use bi_set_op_attrs() (David)
>>
>> Changes to v1:
>> - move 'super_page' into for-loop in btrfs_scratch_superblocks() (Nikolay)
>> - switch to using pagecahce instead of alloc_pages() (Nikolay, David)
>> ---
>> fs/btrfs/disk-io.c | 78 +++++++++++++++++++++++++---------------------
>> fs/btrfs/disk-io.h | 4 +--
>> fs/btrfs/volumes.c | 57 +++++++++++++++++----------------
>> fs/btrfs/volumes.h | 2 --
>> 4 files changed, 76 insertions(+), 65 deletions(-)
>>
>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>> index 28622de9e642..bc14ef1aadda 100644
>> --- a/fs/btrfs/disk-io.c
>> +++ b/fs/btrfs/disk-io.c
>> @@ -2617,11 +2617,12 @@ int __cold open_ctree(struct super_block *sb,
>> u64 features;
>> u16 csum_type;
>> struct btrfs_key location;
>> - struct buffer_head *bh;
>> struct btrfs_super_block *disk_super;
>> struct btrfs_fs_info *fs_info = btrfs_sb(sb);
>> struct btrfs_root *tree_root;
>> struct btrfs_root *chunk_root;
>> + struct page *super_page;
>> + u8 *superblock;
>
> I thought you agree to turn this into a struct btrfs_super_block
> pointer?
As stated in the cover letter, I lost track of the TODOs ;-)
>> bytenr = btrfs_sb_offset(copy_num);
>> if (bytenr + BTRFS_SUPER_INFO_SIZE >= i_size_read(bdev->bd_inode))
>> return -EINVAL;
>>
>> - bh = __bread(bdev, bytenr / BTRFS_BDEV_BLOCKSIZE, BTRFS_SUPER_INFO_SIZE);
>> - /*
>> - * If we fail to read from the underlying devices, as of now
>> - * the best option we have is to mark it EIO.
>> - */
>> - if (!bh)
>> - return -EIO;
>> + gfp_mask = mapping_gfp_constraint(mapping, ~__GFP_FS) | __GFP_NOFAIL;
>> + page = read_cache_page_gfp(mapping, bytenr >> PAGE_SHIFT, gfp_mask);
>> + if (IS_ERR_OR_NULL(page))
>> + return -ENOMEM;
>
> Why do you need the __GFP_NOFAIL given that failures are handled
> properly here? Also I think instead of using mapping_gfp_constraint you
> can use GFP_NOFS directly here.
OK
>>
>> - super = (struct btrfs_super_block *)bh->b_data;
>> + super = kmap(page);
>> if (btrfs_super_bytenr(super) != bytenr ||
>> btrfs_super_magic(super) != BTRFS_MAGIC) {
>> - brelse(bh);
>> + kunmap(page);
>> + put_page(page);
>> return -EINVAL;
>> }
>> + kunmap(page);
>
> Also last time I wondered why we can't leave the page mapped for the
> caller and also return the virtual address? That would keep the
> callers a little cleaner. Note that you don't need to pass the
> struct page in that case as the unmap helper can use kmap_to_page (and
> I think a helper would be really nice for the unmap and put anyway).
>
There's btrfs_release_disk_super() but David didn't like the use of it
in v2 of this series. But when using a 'struct btrfs_disk_super' instead
of a 'struct page' I think he could be ok.