On 2019/11/25 上午3:09, Christian Pernegger wrote: > Am So., 24. Nov. 2019 um 01:38 Uhr schrieb Qu Wenruo <quwenruo.btrfs@xxxxxxx>: >> In short, unless you really need to know how many bytes each snapshots >> really takes, then disable qgroup. >> >> And BTW, for "many" subvolumes/snapshots, I guess we mean 20. >> 200 is already prone to cause problem, not only qgroups, but also send. >> >> So it's also recommended to reduce the number of snapshots. > > I've disabled qgroups for now, we'll see how that goes. These are > personal desktops, they would have been nice to have, that's all. > Sadly that means that they probably won't work on any storage setup > complex enough for them to be really useful, either, yet. > If btrfs scales so badly with the number of subvolumes that having >20 > at a time should be avoided, doesn't that kill a lot of interesting > use-cases? My "time machine" desktop setup, certainly, but anything > with a couple of users or VMs would chew through that 20 pretty > quickly, even before snapshots. Which leaves the LVM use-case > (snapshot, backup the snapshot, delete the snapshot). BTW, that 20 number means 20 snapshots (they all have some shared tree blocks). If it's 20 subvolume (no shared tree/data between each), then it counts as 1. The main time consuming part is the shared tree/data check, as btrfs uses indirect way to record them on-disk, forcing us to do complex walk-back. Thankfully, we have some plan to improve it. > >> The slowdown happens in commit transaction, and with commit transaction, >> a lot of operation is blocked until current transaction is committed. >> >> That's why it blocks everything. >> >> We had tried our best to reduce the impact, but deletion is still a big >> problem, as it can cause tons of extents to change their owner, thus >> cause the problem. > > Sure, but why does it *have to* block? Couldn't the intent to delete > the subvolume be committed, the metadata changes / actual deletion > happen at leisure? Unfortunately, not that easy. We have already delayed a lot of metadata operation, and commit transaction is the only time we get a consistent metadata view. That's why it has to happen at that critical section. > Yes, if qgroups are on, then the qgroup info will > be behind, but so what? It's already behind. > At least I think that lax/lazy qgroups would > be a nice option to have. Qgroup is bond to delayed extent tree updates. While extent tree update is already delayed to transaction commit time, if it's further delayed, the consistency of the fs will be corrupted. The plan to solve it is to introduce a global cache for backref walk, which would not only benefit qgroup, but also send with reflink. Although there will be some new challenges, we will see if the cache will be worthy. Thanks, Qu > Also, I still don't get why disabling qgroups, reenabling them and > doing a full rescan is lightning fast (and non-blocking), while just > leaving them on results in the observed behaviour. > > Cheers, > C. >
Attachment:
signature.asc
Description: OpenPGP digital signature
