Re: [PATCH v3 3/3] btrfs: Introduce new incompat feature, BG_TREE, to speed up mount time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 10, 2019 at 10:39:28AM +0800, Qu Wenruo wrote:
> The overall idea of the new BG_TREE is pretty simple:
> Put BLOCK_GROUP_ITEMS into a separate tree.
> 
> This brings one obvious enhancement:
> - Reduce mount time of large fs
> 
> Although it could be possible to accept BLOCK_GROUP_ITEMS in either
> trees (extent root or bg root), I'll leave that kernel convert as
> alternatives to offline convert, as next step if there are a lot of
> interests in that.
> 
> So for now, if an existing fs want to take advantage of BG_TREE feature,
> btrfs-progs will provide offline convertion tool.
> 
> [[Benchmark]]
> Physical device:	NVMe SSD
> VM device:		VirtIO block device, backup by sparse file
> Nodesize:		4K  (to bump up tree height)
> Extent data size:	4M
> Fs size used:		1T
> 
> All file extents on disk is in 4M size, preallocated to reduce space usage
> (as the VM uses loopback block device backed by sparse file)
> 
> Without patchset:
> Use ftrace function graph:
> 
>  7)               |  open_ctree [btrfs]() {
>  7)               |    btrfs_read_block_groups [btrfs]() {
>  7) @ 805851.8 us |    }
>  7) @ 911890.2 us |  }
> 
>  btrfs_read_block_groups() takes 88% of the total mount time,
> 
> With patchset, and use -O bg-tree mkfs option:
> 
>  6)               |  open_ctree [btrfs]() {
>  6)               |    btrfs_read_block_groups [btrfs]() {
>  6) * 91204.69 us |    }
>  6) @ 192039.5 us |  }
> 
>   open_ctree() time is only 21% of original mount time.
>   And btrfs_read_block_groups() only takes 47% of total open_ctree()
>   execution time.
> 
> The reason is pretty obvious when considering how many tree blocks needs
> to be read from disk:
> - Original extent tree:
>   nodes:	55
>   leaves:	1025
>   total:	1080
> - Block group tree:
>   nodes:	1
>   leaves:	13
>   total:	14
> 
> Not to mention all the tree blocks readahead works pretty fine for bg
> tree, as we will read every item.
> While readahead for extent tree will just be a diaster, as all block
> groups are scatter across the whole extent tree.
> 
> The reduction of mount time is already obvious even on super fast NVMe
> disk with memory cache.
> It would be even more obvious if the fs is on spinning rust.
> 
> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>

You need to add

fs_info->bg_root->block_rsv = &fs_info->delayed_refs_rsv;

to btrfs_init_global_block_rsv, otherwise bad things will happen.  Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux