Re: [PATCH v3 3/3] btrfs: statfs: Use virtual chunk allocation to calculation available data space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 1/6/20 1:13 AM, Qu Wenruo wrote:
Although btrfs_calc_avail_data_space() is trying to do an estimation
on how many data chunks it can allocate, the estimation is far from
perfect:

- Metadata over-commit is not considered at all
- Chunk allocation doesn't take RAID5/6 into consideration

Although current per-profile available space itself is not able to
handle metadata over-commit itself, the virtual chunk infrastructure can
be re-used to address above problems.

This patch will change btrfs_calc_avail_data_space() to do the following
things:
- Do metadata virtual chunk allocation first
   This is to address the over-commit behavior.
   If current metadata chunks have enough free space, we can completely
   skip this step.

- Allocate data virtual chunks as many as possible
   Just like what we did in per-profile available space estimation.
   Here we only need to calculate one profile, since statfs() call is
   a relative cold path.

Now statfs() should be able to report near perfect estimation on
available data space, and can handle RAID5/6 better.

[BENCHMARK]
For the performance difference, here is the benchmark:
  Disk layout:
  - devid 1:	1G
  - devid 2:	2G
  - devid 3:	3G
  - devid 4:	4G
  - devid 5:	5G
  metadata:	RAID1
  data:		RAID5

  This layout should be the worst case for RAID5, as it can
  from 5 disks raid5 to 2 disks raid 5 with unusable space.

  Then use ftrace to trace the execution time of btrfs_statfs() after
  allocating 1G data chunk. Both have 12 samples.
				avg		std
  Patched:			17.59 us	7.04
  WIthout patch (v5.5-rc2):	14.98 us	6.16

When the fs is cold, there is a small performance for this particular
case, as we need to do several more iterations to calculate correct
RAID5 data space.
But it's still pretty good, and won't block chunk allocator for any
observable time.

When the fs is hot, the performance bottleneck is the chunk_mutex, where
the most common and longest holder would be __btrfs_chunk_alloc().
In that case, we may sleep much longer as __btrfs_chunk_alloc() can
trigger IO.

Since the new implementation is not observable slower than the old one,
and won't cause meaningful delay for chunk allocator, it should be more
or less OK.

Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>

Sorry I should have been more specific. Taking the chunk_mutex here means all the people running statfs at the same time are going to be serialized. Can we just do what overcommit does and take the already calculated free space instead of doing the calculation again? Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux