On 2020-07-09 03:46, Robbie Ko wrote:
Holger Hoffstätte 於 2020/7/8 下午10:57 寫道:
On 2020-07-08 16:04, David Sterba wrote:
On Wed, Jul 08, 2020 at 10:19:22AM +0800, Robbie Ko wrote:
David Sterba 於 2020/7/8 上午3:25 寫道:
On Tue, Jul 07, 2020 at 11:59:44AM +0800, robbieko wrote:
From: Robbie Ko <robbieko@xxxxxxxxxxxx>
When mounting, we always need to read the whole chunk tree,
when there are too many chunk items, most of the time is
spent on btrfs_read_chunk_tree, because we only read one
leaf at a time.
It is unreasonable to limit the readahead mechanism to a
range of 64k, so we have removed that limit.
In addition we added reada_maximum_size to customize the
size of the pre-reader, The default is 64k to maintain the
original behavior.
So we fix this by used readahead mechanism, and set readahead
max size to ULLONG_MAX which reads all the leaves after the
key in the node when reading a level 1 node.
The readahead of chunk tree is a special case as we know we will need
the whole tree, in all other cases the search readahead needs is
supposed to read only one leaf.
If, in most cases, readahead requires that only one leaf be read, then
reada_ maximum_size should be nodesize instead of 64k, or use
reada_maximum_ nr (default:1) seems better.
For that reason I don't want to touch the current path readahead logic
at all and do the chunk tree readahead in one go instead of the
per-search.
I don't know why we don't make the change to readahead, because the current
readahead is limited to the logical address in 64k is very unreasonable,
and there is a good chance that the logical address of the next leaf
node will
not appear in 64k, so the existing readahead is almost useless.
I see and it seems that the assumption about layout and chances
succesfuly read blocks ahead is not valid. The logic of readahead could
be improved but that would need more performance evaluation.
FWIW I gave this a try and see the following numbers, averaged over multiple
mount/unmount cycles on spinning rust:
without patch : ~2.7s
with patch : ~4.5s
..ahem..
I have the following two questions for you.
1. What is the version you are using?
5.7.8 + a few select patches from 5.8.
2. Can you please measure the time of btrfs_read_chunk_tree alone?
No perf on this system & not enough time right now, sorry.
But it shouldn't matter either way, see below.
I think the problem you are having is that btrfs_read_block_groups is
slowing down because it is using the wrong READA flag, which is causing
a lot of useless IO's when reading the block group.
This can be fixed with the following commit.
btrfs: block-group: don't set the wrong READA flag for btrfs_read_block_groups()
https://git.kernel.org/pub/scm/linux/kernel /git/torvalds/linux.git/commit/?h=v5.8-rc4& id=83fe9e12b0558eae519351cff00da1e06bc054d2
Ah yes, that was missing. However it doesn't seem to improve things
that much either; with 83fe9e12 but with or without your patch I now get
~2.8..~2.9s mount time. Probably because I don't have that many
metadata block groups (only 4GB).
From a conceptual perspective it it probably much easier just to
merge the bgtree patchset, since that does the right thing without
upsetting the overall readahead apple cart.
thanks,
Holger