On 2018年02月16日 22:12, Ellis H. Wilson III wrote: > On 02/15/2018 08:55 PM, Qu Wenruo wrote: >> On 2018年02月16日 00:30, Ellis H. Wilson III wrote: >>> Very helpful information. Thank you Qu and Hans! >>> >>> I have about 1.7TB of homedir data newly rsync'd data on a single >>> enterprise 7200rpm HDD and the following output for btrfs-debug: >>> >>> extent tree key (EXTENT_TREE ROOT_ITEM 0) 543384862720 level 2 >>> total bytes 6001175126016 >>> bytes used 1832557875200 >>> >>> Hans' (very cool) tool reports: >>> ROOT_TREE 624.00KiB 0( 38) 1( 1) >>> EXTENT_TREE 327.31MiB 0( 20881) 1( 66) 2( 1) >> >> Extent tree is not so large, a little unexpected to see such slow mount. >> >> BTW, how many chunks do you have? >> >> It could be checked by: >> >> # btrfs-debug-tree -t chunk <device> | grep CHUNK_ITEM | wc -l > > Since yesterday I've doubled the size by copying the homdir dataset in > again. Here are new stats: > > extent tree key (EXTENT_TREE ROOT_ITEM 0) 385990656 level 2 > total bytes 6001175126016 > bytes used 3663525969920 > > $ sudo btrfs-debug-tree -t chunk /dev/sdb | grep CHUNK_ITEM | wc -l > 3454 OK, this explains everything. There are too many chunks. This means at mount you need to search for block group item 3454 times. Even each search only needs to iterate 3 tree blocks, multiply it 3454 it would still be a big work. Although some tree blocks like the root node and level 1 nodes can be cached, we still need to read about 3500 tree blocks. If the fs is created using 16K nodesize, this means you need to do random read for 54M using 16K blocksize. No wonder it will takes some time. Normally I would expect 1G chunk for each data and metadata chunk. If there is nothing special, it means your filesystem is already larger than 3T. If your used space is way smaller (less than 30%) than 3.5T, then this means your chunk usage is pretty low, and in that case, balance to reduce number of chunks (block groups) would reduce mount time. My personally estimate about mount time is O(nlogn). So if you are able to reduce chunk number to half, you could reduce mount time by 60%. > > $ sudo ./show_metadata_tree_sizes.py /mnt/btrfs/ > ROOT_TREE 1.14MiB 0( 72) 1( 1) > EXTENT_TREE 644.27MiB 0( 41101) 1( 131) 2( 1) > CHUNK_TREE 384.00KiB 0( 23) 1( 1) > DEV_TREE 272.00KiB 0( 16) 1( 1) > FS_TREE 11.55GiB 0(754442) 1( 2179) 2( 5) 3( 2) > CSUM_TREE 3.50GiB 0(228593) 1( 791) 2( 2) 3( 1) > QUOTA_TREE 0.00B > UUID_TREE 16.00KiB 0( 1) > FREE_SPACE_TREE 0.00B > DATA_RELOC_TREE 16.00KiB 0( 1) > > The old mean mount time was 4.319s. It now takes 11.537s for the > doubled dataset. Again please realize this is on an old version of > BTRFS (4.5.5), so perhaps newer ones will perform better, but I'd still > like to understand this delay more. Should I expect this to scale in > this way all the way up to my proposed 60-80TB filesystem so long as the > file size distribution stays roughly similar? That would definitely be > in terms of multiple minutes at that point. > >>> Taking 100 snapshots (no changes between snapshots however) of the above >>> subvolume doesn't appear to impact mount/umount time. >> >> 100 unmodified snapshots won't affect mount time. >> >> It needs new extents, which can be created by overwriting extents in >> snapshots. >> So it won't really cause much difference if all these snapshots are all >> unmodified. > > Good to know, thanks! > >>> Snapshot creation >>> and deletion both operate at between 0.25s to 0.5s. >> >> IIRC snapshot deletion is delayed, so the real work doesn't happen when >> "btrfs sub del" returns. > > I was using btrfs sub del -C for the deletions, so I believe (if that > command truly waits for the subvolume to be utterly gone) it captures > the entirety of the snapshot. No, snapshot deletion is completely delayed in background. -C only ensures that even a powerloss happen after command return, you won't see the snapshot anywhere, but it will still be deleted in background. Thanks, Qu > > Best, > > ellis > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc
Description: OpenPGP digital signature
