On 2020/6/13 上午12:48, David Sterba wrote: > On Fri, Jun 12, 2020 at 02:42:37PM +0800, Qu Wenruo wrote: >> For anonymous block device, we have at most 1 << 20 devices to allocate, >> which looks quite a lot, but if we have a workload which created 1 >> snapshots per second, we only need 12 days to exhaust the whole pool. > > 1<<20 is 1M and that would mean that there that many snapshots active at > the same time in order to allocate all the anonymous block devices. Once > a snapshot is not part of any path the device number is released and can > be reused. So simply multiplying the numbers does not reflect the > reality. Yes, but for that number we can still exhaust the pool if the subvolumes are not cleaned up. > > A plausible explanation is leak of the anon bdev by something else than > btrfs on the system. > I'm not sure if it's a leak. As you can see the free_anon_bdev() call in btrfs_put_root(). Although I understand that we use bdev as a namespace seperator, but I'm still not confident about whether it's that important. One point is, I didn't see much users of bdev member to distinguish any subvolume, no to mention the "sub" nature of subvolume. It's not a full volume, since it already shares the chunk, extent, dev, root trees, if it shares the same bdev, it won't cause any new users any problem. That's why I'm pushing the RFC patch. Are there that many users relying on bdev to distinguish different subvolumes? Any example would be very helpful. Another problem is, for such lightweight btrfs subvolume, there are only that many things we can do to reduce the frequency to hit such problem. We can never eliminate the problem. If the bdev problem is really affecting a lot of existing users, I can make the behavior toggleable, using mount option maybe? Thanks, Qu
Attachment:
signature.asc
Description: OpenPGP digital signature
