Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm to reduce ENOSPC caused by unbalanced data/metadata allocation.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Oct 30, 2014 at 11:46:18AM +0800, Qu Wenruo wrote:
> 
> -------- Original Message --------
> Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
> to reduce ENOSPC caused by unbalanced data/metadata allocation.
> From: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> To: bo.li.liu@xxxxxxxxxx
> Date: 2014年10月30日 08:58
> >
> >-------- Original Message --------
> >Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation
> >algorithm to reduce ENOSPC caused by unbalanced data/metadata
> >allocation.
> >From: Liu Bo <bo.li.liu@xxxxxxxxxx>
> >To: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> >Date: 2014年10月29日 22:29
> >>On Mon, Oct 27, 2014 at 04:36:26PM +0800, Qu Wenruo wrote:
> >>>-------- Original Message --------
> >>>Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
> >>>to reduce ENOSPC caused by unbalanced data/metadata allocation.
> >>>From: Liu Bo <bo.li.liu@xxxxxxxxxx>
> >>>To: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> >>>Date: 2014年10月27日 16:14
> >>>>On Mon, Oct 27, 2014 at 08:18:12AM +0800, Qu Wenruo wrote:
> >>>>>-------- Original Message --------
> >>>>>Subject: Re: [PATCH] btrfs: Enhance btrfs chunk allocation algorithm
> >>>>>to reduce ENOSPC caused by unbalanced data/metadata allocation.
> >>>>>From: Liu Bo <bo.li.liu@xxxxxxxxxx>
> >>>>>To: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> >>>>>Date: 2014年10月24日 19:06
> >>>>>>On Thu, Oct 23, 2014 at 10:37:51AM +0800, Qu Wenruo wrote:
> >>>>>>>When btrfs allocate a chunk, it will try to alloc up
> >>>>>>>to 1G for data and
> >>>>>>>256M for metadata, or 10% of all the writeable space
> >>>>>>>if there is enough
> >>>>>>10G for data,
> >>>>>>         if (type & BTRFS_BLOCK_GROUP_DATA) {
> >>>>>>                 max_stripe_size = 1024 * 1024 * 1024;
> >>>>>>                 max_chunk_size = 10 * max_stripe_size;
> >>>>>Oh, sorry, 10G is right.
> >>>>>
> >>>>>Any other comments?
> >>>>>
> >>>>>Thanks,
> >>>>>Qu
> >>>>>
> >>>>>
> >>>>>>        ...
> >>>>>>
> >>>>>>thanks,
> >>>>>>-liubo
> >>>>>>
> >>>>>>>space for the stripe on device.
> >>>>>>>
> >>>>>>>However, when we run out of space, this allocation may
> >>>>>>>cause unbalanced
> >>>>>>>chunk allocation.
> >>>>>>>For example, there are only 1G unallocated space, and request for
> >>>>>>>allocate DATA chunk is sent, and all the space will be
> >>>>>>>allocated as data
> >>>>>>>chunk, making later metadata chunk alloc request
> >>>>>>>unable to handle, which
> >>>>>>>will cause ENOSPC.
> >>>>>>>This is the one of the common complains from end users
> >>>>>>>about why ENOSPC
> >>>>>>>happens but there is still available space.
> >>>>Okay, I don't think this is the common case, AFAIK, the most
> >>>>ENOSPC is caused
> >>>>by our runtime worst case metadata reservation problem.
> >>>>
> >>>>btrfs has been inclined to create a fairly large metadata
> >>>>chunk (1G) in its
> >>>>initial mkfs stage and 256M metadata chunk is also a very large one.
> >>>>
> >>>>As of your below example, yes, we don't have space for metadata
> >>>>allocation, but do we really need to allocate a new one?
> >>>>
> >>>>Or am I missing something?
> >>>>
> >>>>thanks,
> >>>>-liubo
> >>>Yes that's true this is not the common cause, but at least this
> >>>patch may make the percentage
> >>>of 'df' command reach as close to 100% as possible before hitting
> >>>ENOSPC under normal operations.
> >>>(If not using balance)
> >>>
> >>>And some case like the following mail may be improved by the patch:
> >>>https://www.mail-archive.com/linux-btrfs@xxxxxxxxxxxxxxx/msg36097.html
> >>>
> >>>I understand that most of the cases that a lot of free data space
> >>>and no metadata space is caused by
> >>>create and then delete large files, but if the last giga bytes can
> >>>be allocated more carefully,
> >>>at least the available bytes of 'df'  command should be reduced
> >>>before hit ENOSPC.
> >>>
> >>>How do you think about it?
> >>Sorry for the late reply.
> >>
> >>I just notice that a recent commit has fixed this problem.
> >>
> >>commit 47ab2a6c689913db23ccae38349714edf8365e0a
> >>Author: Josef Bacik <jbacik@xxxxxx>
> >>Date:   Thu Sep 18 11:20:02 2014 -0400
> >>
> >>     Btrfs: remove empty block groups automatically
> >>     thanks,
> >>-liubo
> >Oh, that's much better than my patch.
> >
> >So please ignore my patch.
> >
> >Thanks,
> >Qu
> Wait a second,
> that's true block group auto-reclaim can deal with some cases,
> but it will not improve the vanilla 'df' used percentage before hit ENOSPC.
> 
> The old 10%/10G will still hit the ENOSPC below 90% used space if
> using 100G disk.
> This patch should improve it to above 95% or even above 99%.
> 
> The old behavior may leave a bad image on normal users that btrfs
> can't use space effectively.
> 
> So I still consider the patch has positive effect on btrfs.

Okay, I buy this.

> 
> Thanks,
> Qu
> >>
> >>>Thanks,
> >>>Qu
> >>>>>>>This patch will try not to alloc chunk which is more
> >>>>>>>than half of the
> >>>>>>>unallocated space, making the last space more balanced
> >>>>>>>at a small cost
> >>>>>>>of more fragmented chunk at the last 1G.
> >>>>>>>
> >>>>>>>Some easy example:
> >>>>>>>Preallocate 17.5G on a 20G empty btrfs fs:
> >>>>>>>[Before]
> >>>>>>>  # btrfs fi show /mnt/test
> >>>>>>>Label: none  uuid: da8741b1-5d47-4245-9e94-bfccea34e91e
> >>>>>>>    Total devices 1 FS bytes used 17.50GiB
> >>>>>>>    devid    1 size 20.00GiB used 20.00GiB path /dev/sdb
> >>>>>>>All space is allocated. No space later metadata space.
> >>>>>>>
> >>>>>>>[After]
> >>>>>>>  # btrfs fi show /mnt/test
> >>>>>>>Label: none  uuid: e6935aeb-a232-4140-84f9-80aab1f23d56
> >>>>>>>    Total devices 1 FS bytes used 17.50GiB
> >>>>>>>    devid    1 size 20.00GiB used 19.77GiB path /dev/sdb
> >>>>>>>About 230M is still available for later metadata allocation.
> >>>>>>>
> >>>>>>>Signed-off-by: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> >>>>>>>---
> >>>>>>>  fs/btrfs/volumes.c | 18 ++++++++++++++++++
> >>>>>>>  1 file changed, 18 insertions(+)
> >>>>>>>
> >>>>>>>diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
> >>>>>>>index d47289c..fa8de79 100644
> >>>>>>>--- a/fs/btrfs/volumes.c
> >>>>>>>+++ b/fs/btrfs/volumes.c
> >>>>>>>@@ -4240,6 +4240,7 @@ static int
> >>>>>>>__btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
> >>>>>>>      int ret;
> >>>>>>>      u64 max_stripe_size;
> >>>>>>>      u64 max_chunk_size;
> >>>>>>>+    u64 total_avail_space = 0;
> >>>>>>>      u64 stripe_size;
> >>>>>>>      u64 num_bytes;
> >>>>>>>      u64 raid_stripe_len = BTRFS_STRIPE_LEN;
> >>>>>>>@@ -4352,10 +4353,27 @@ static int
> >>>>>>>__btrfs_alloc_chunk(struct btrfs_trans_handle *trans,
> >>>>>>>          devices_info[ndevs].max_avail = max_avail;
> >>>>>>>          devices_info[ndevs].total_avail = total_avail;
> >>>>>>>          devices_info[ndevs].dev = device;
> >>>>>>>+        total_avail_space += total_avail;
> >>>>>>>          ++ndevs;
> >>>>>>>      }
> >>>>>>>      /*
> >>>>>>>+     * Try not to occupy more than half of the unallocated space.
> >>>>>>>+     * When run short of space and alloc all the space to
> >>>>>>>+     * data/metadata will cause ENOSPC to be
> >>>>>>>triggered more easily.
> >>>>>>>+     *
> >>>>>>>+     * And since the minimum chunk size is 16M, the
> >>>>>>>half-half will cause
> >>>>>>>+     * 16M allocated from 20M available space and
> >>>>>>>reset 4M will not be
> >>>>>>>+     * used ever. In that case(16~32M), allocate all directly.
> >>>>>>>+     */
> >>>>>>>+    if (total_avail_space < 32 * 1024 * 1024 &&
> >>>>>>>+        total_avail_space > 16 * 1024 * 1024)
> >>>>>>>+        max_chunk_size = total_avail_space;
> >>>>>>>+    else
> >>>>>>>+        max_chunk_size = min(total_avail_space / 2,
> >>>>>>>max_chunk_size);
> >>>>>>>+    max_chunk_size = min(total_avail_space / 2, max_chunk_size);
              ^^^^^^^^

Why another one?  This won't make it use all space within [16M, 32M].

thanks,
-liubo

> >>>>>>>+
> >>>>>>>+    /*
> >>>>>>>       * now sort the devices by hole size / available space
> >>>>>>>       */
> >>>>>>>      sort(devices_info, ndevs, sizeof(struct btrfs_device_info),
> >>>>>>>-- 
> >>>>>>>2.1.2
> >>>>>>>
> >>>>>>>-- 
> >>>>>>>To unsubscribe from this list: send the line
> >>>>>>>"unsubscribe linux-btrfs" in
> >>>>>>>the body of a message to majordomo@xxxxxxxxxxxxxxx
> >>>>>>>More majordomo info at http://vger.kernel.org/majordomo-info.html
> >
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux