Re: mount btrfs takes 30 minutes, btrfs check runs out of memory

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





At 07/18/2016 04:53 PM, John Ettedgui wrote:
On Mon, Jul 18, 2016 at 1:42 AM Qu Wenruo <quwenruo@xxxxxxxxxxxxxx
<mailto:quwenruo@xxxxxxxxxxxxxx>> wrote:


    > So following that, another partition got its mounting time reduced by
    > about 70% by running a manual defrag (I kept compression on and used
    > -clzo for this defragmentation).
    > So maybe a manual defrag is really the best thing to do so far.

    Seems to be the case.

    For further investigation, it would be quite nice for you to upload the
    output of "btrfs-debug-tree -t 2" dump of your fs.
    Both before and after, and it doesn't containing anything meaningful(no
    file name/relation, only extent allocation info), so it's should be
    quite safe to upload.

What do you mean by before and after?
Before defragmentation?

Yes, to compare the extent size and verify my assumption.

But I'm afraid you don't have any fs with that slow mount time any more.


    Since I'm really surprised on the mount time reduce, especially
    considering the fact that for compression case, max extent size is
    limited to 128K, IMHO defrag won't help much.

Is the 128K limit for the whole FS or only for files that btrfs deemed
worth to compress? If it's the latter, that could explain why defrag helped.

The latter. But the 128K is not for compressed size, but raw size.

So no matter the compressed size, any extent whose uncompressed size is larger than 128K will be split.

The main reason I'm surprised about the mount time reduce, is that considering the sectorsize (4K for x86_64 and x86), the fragments won't increase too much.
The smallest extent size is determined by sectorsize(4K for most arch).
Compressed extent up limit is 128K,  4K -> 128K is only 32 times.
While for non-compress case, its extent size up limit is 128M.
32K times larger than sector size, or 1024 times larger than compressed extent size.

So I'm quite surprised that defrag helps so much.



    >     And after applying my patch, please try to compare the
    executing time of
    >     btrfs_read_block_groups() to see if there is any obvious(>5%)
    change.
    >
    > Here's what I have for one partition:
    >
    > no patch:
    > open_ctree: 16952419
    > btrfs_read_block_groups: 16844453
    > ratio: 0.9936312333950689
    >
    > patch:
    > open_ctree: 16680173
    > btrfs_read_block_groups: 16600532
    > ratio: 0.9952254092328659
    >
    > ratio no patch/patch: 0.9983981761086407

    OK, almost no improvement. So in your case, most BLOCK_GROUP_ITEMS are
    not at the tail of a extent tree leaf.
    And in our test environment, it seems that quite some BLOCK_GROUPS_ITEMS
    are at the tail of an extent tree leaf, and make the improvement quite
    obvious.

    But anyway, if we can change the on-disk format to introduce a specific
    block group items tree, then I assume the mount time would reduce to
    less than 5 seconds.

Less than 5 seconds without regular defrag would be nice.
It  would be even nicer to be able to convert from one format to another
and not need to do it at mkfs time, but I don't know how feasible that
will be.

If it's possible, it may works just like METADATA_ITEM(or skinny_metadata feature), and in that case, time reduce will depend on how many BLOCK_GROUP_ITEMs are in the new tree.

Thanks,
Qu

Another option would be to use something like bcache to have the extent
tree on a SSD while the data stays on the HD. No idea how feasible that
would be though...

Thank you,
John


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux