On Wed, Oct 22, 2014 at 07:41:32AM +0000, Duncan wrote: > Tomasz Chmielewski posted on Wed, 22 Oct 2014 09:14:14 +0200 as excerpted: > >> Tho that is of course per subvolume. If you have multiple subvolumes > >> on the same filesystem, that can still end up being a thousand or two > >> snapshots per filesystem. But those are all groups of something under > >> 300 (under 100 with hourly) highly connected to each other, with the > >> interweaving inside each of those groups being the real complexity in > >> terms of btrfs management. > > IOW, if you thin down the snapshots per subvolume to something reasonable > (under 300 for sure, preferably under 100), then depending on the number > of subvolumes you're snapshotting, you might have a thousand or two. > However, of those couple thousand, btrfs will only have to deal with the > under 300 and preferably well under a hundred in the same group, that are > snapshots of the same thing and thus related to each other, at any given > time. The other snapshots will be there but won't be adding to the > complexity near as much since they're of different subvolumes and aren't > logically interwoven together with the ones being considered at that > moment. > > But even then, at say 250 snapshots per subvolume, 2000 snapshots is 8 > independent subvolumes. That could happen. But 5000 snapshots? That'd > be 20 independent subvolumes, which is heading toward the extreme again. > Yes it could happen, but better if it does to cut down on the per- > subvolume snapshots further, to say the 25 per subvolume I mentioned, or > perhaps even further. 25 snapshots per subvolume with those same 20 > subvolumes... 500 snapshots total instead of 5000. =:^) If you have one subvolume per user and 1000 user directories on a server, it's only 5 snapshots per user (last hour, last day, last week, last month, and last year). I hear this is a normal use case in the ZFS world. It would certainly be attractive if there was working quota support. I have datasets where I record 14000+ snapshots of filesystem directory trees scraped from test machines and aggregated onto a single server for deduplication...but I store each snapshot as a git commit, not as a btrfs snapshot or even subvolume. We do sometimes run queries like "in the last two years, how many times did $CONDITION occur?" which will scan a handful files in all of the snapshots. The use case itself isn't unreasonable, although using the filesystem instead of a more domain-specific tool to achieve it may be.
Attachment:
signature.asc
Description: Digital signature
