Re: [PATCH 11/11] btrfs: add zstd compression level support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 29, 2019 at 09:25:54AM +0200, Nikolay Borisov wrote:
> 
> 
> On 28.01.19 г. 23:24 ч., Dennis Zhou wrote:
> > Zstd compression requires different amounts of memory for each level of
> > compression. The prior patches implemented indirection to allow for each
> > compression type to manage their workspaces independently. This patch
> > uses this indirection to implement compression level support for zstd.
> > 
> > As mentioned above, a requirement that differs zstd from zlib is that
> > higher levels of compression require more memory. To manage this, each
> > compression level has its own queue of workspaces. A global LRU is used
> > to help with reclaim. To guarantee forward progress, a max level
> > workspace is preallocated and hidden from the LRU.
> > 
> > When getting a workspace, it uses a bitmap to identify the levels that
> > are populated and scans up. If it finds a workspace that is greater than
> > it, it uses it, but does not update the last_used time and the
> > corresponding place in the LRU. This provides a mechanism to decrease
> > memory utilization as we only keep around workspaces that are sized
> > appropriately for the in use compression levels.
> > 
> > By knowing which compression levels have available workspaces, we can
> > recycle rather than always create new workspaces as well as take
> > advantage of the preallocated max level for forward progress. If we hit
> > memory pressure, we sleep on the max level workspace. We continue to
> > rescan in case we can use a smaller workspace, but eventually should be
> > able to obtain the max level workspace or allocate one again should
> > memory pressure subside. The memory requirement for decompression is the
> > same as level 1, and therefore can use any of available workspace.
> > 
> > The number of workspaces is bound by an upper limit of the workqueue's
> > limit which currently is 2 (percpu limit). Second, a reclaim timer is
> > used to free inactive/improperly sized workspaces. The reclaim timer is
> > set to 67s to avoid colliding with transaction commit (every 30s) and
> > attempts to reclaim any unused workspace older than 45s.
> > 
> > Repeating the experiment from v2 [1], the Silesia corpus was copied to a
> > btrfs filesystem 10 times and then read back after dropping the caches.
> > The btrfs filesystem was on an SSD.
> > 
> > Level   Ratio   Compression (MB/s)  Decompression (MB/s)
> > 1       2.658        438.47                910.51
> > 2       2.744        364.86                886.55
> > 3       2.801        336.33                828.41
> > 4       2.858        286.71                886.55
> > 5       2.916        212.77                556.84
> > 6       2.363        119.82                990.85
> > 7       3.000        154.06                849.30
> > 8       3.011        159.54                875.03
> > 9       3.025        100.51                940.15
> > 10      3.033        118.97                616.26
> > 11      3.036         94.19                802.11
> > 12      3.037         73.45                931.49
> > 13      3.041         55.17                835.26
> > 14      3.087         44.70                716.78
> > 15      3.126         37.30                878.84
> > 
> > [1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@xxxxxx/
> > 
> > Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx>
> > Cc: Nick Terrell <terrelln@xxxxxx>
> > Cc: Omar Sandoval <osandov@xxxxxx>
> > ---
> >  fs/btrfs/super.c |   6 +-
> >  fs/btrfs/zstd.c  | 229 +++++++++++++++++++++++++++++++++++++++++++++--
> >  2 files changed, 226 insertions(+), 9 deletions(-)
> > 
> > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
> > index b28dff207383..0ecc513cb56c 100644
> > --- a/fs/btrfs/super.c
> > +++ b/fs/btrfs/super.c
> > @@ -544,9 +544,13 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options,
> >  				btrfs_clear_opt(info->mount_opt, NODATASUM);
> >  				btrfs_set_fs_incompat(info, COMPRESS_LZO);
> >  				no_compress = 0;
> > -			} else if (strcmp(args[0].from, "zstd") == 0) {
> > +			} else if (strncmp(args[0].from, "zstd", 4) == 0) {
> >  				compress_type = "zstd";
> >  				info->compress_type = BTRFS_COMPRESS_ZSTD;
> > +				info->compress_level =
> > +					btrfs_compress_str2level(
> > +							 BTRFS_COMPRESS_ZSTD,
> > +							 args[0].from + 4);
> >  				btrfs_set_opt(info->mount_opt, COMPRESS);
> >  				btrfs_clear_opt(info->mount_opt, NODATACOW);
> >  				btrfs_clear_opt(info->mount_opt, NODATASUM);
> > diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c
> > index a951d4fe77f7..ce9b466c197f 100644
> > --- a/fs/btrfs/zstd.c
> > +++ b/fs/btrfs/zstd.c
> > @@ -6,20 +6,27 @@
> >   */
> >  
> >  #include <linux/bio.h>
> > +#include <linux/bitmap.h>
> >  #include <linux/err.h>
> >  #include <linux/init.h>
> >  #include <linux/kernel.h>
> >  #include <linux/mm.h>
> > +#include <linux/sched/mm.h>
> >  #include <linux/pagemap.h>
> >  #include <linux/refcount.h>
> >  #include <linux/sched.h>
> >  #include <linux/slab.h>
> >  #include <linux/zstd.h>
> >  #include "compression.h"
> > +#include "ctree.h"
> >  
> >  #define ZSTD_BTRFS_MAX_WINDOWLOG 17
> >  #define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG)
> >  #define ZSTD_BTRFS_DEFAULT_LEVEL 3
> > +#define ZSTD_BTRFS_MAX_LEVEL 15
> > +#define ZSTD_BTRFS_RECLAIM_NS (45 * NSEC_PER_SEC)
> > +/* 67s to avoid clashing with transaction commit (every 30s) */
> > +#define ZSTD_BTRFS_RECLAIM_JIFFIES (67 * HZ)
> 
> This is valid provided that transaction commit time is not overriden by
> Opt_commit_interval. If it is such a problem to not clash with trans
> commit maybe this should be calculated upon mount?
> 

Because the workspace managers are initialized once and shared,
different mounts can have different Opt_commit_interval settings. I
don't think it's particularly problematic to clash with trans commit,
it's kind of nice to have the offset to in the majority of cases not run
into this.

Thanks,
Dennis



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux