On 12/25/19 8:39 AM, Qu Wenruo wrote:
There are several bug reports of ENOSPC error in btrfs_run_delalloc_range(). With some extra info from one reporter, it turns out that can_overcommit() is using a wrong way to calculate allocatable metadata space. The most typical case would look like: devid 1 unallocated: 1G devid 2 unallocated: 10G metadata profile: RAID1 In above case, we can at most allocate 1G chunk for metadata, due to unbalanced disk free space. But current can_overcommit() uses factor based calculation, which never consider the disk free space balance. To address this problem, here comes the per-profile available space array, which gets updated every time a chunk get allocated/removed or a device get grown or shrunk. This provides a quick way for hotter place like can_overcommit() to grab an estimation on how many bytes it can over-commit. The per-profile available space calculation tries to keep the behavior of chunk allocator, thus it can handle uneven disks pretty well. The RFC tag is here because I'm not yet confident enough about the implementation. I'm not sure this is the proper to go, or just a over-engineered mess.
In general I like the approach, however re-doing the whole calculation once we add or remove a chunk seems like overkill. Can we get away with just doing the big expensive calculation on mount, and then adjust available up and down as we add and remove chunks? We know the chunk allocator is not going to allow us to allocate more chunks than our smallest device, so we can be sure we're never going to go negative, and removing chunks is easy again because of the same reason.
Other than that the approach seems sound to me. Thanks, Josef
