On 28.01.19 г. 23:24 ч., Dennis Zhou wrote: > Zstd compression requires different amounts of memory for each level of > compression. The prior patches implemented indirection to allow for each > compression type to manage their workspaces independently. This patch > uses this indirection to implement compression level support for zstd. > > As mentioned above, a requirement that differs zstd from zlib is that > higher levels of compression require more memory. To manage this, each > compression level has its own queue of workspaces. A global LRU is used > to help with reclaim. To guarantee forward progress, a max level > workspace is preallocated and hidden from the LRU. > > When getting a workspace, it uses a bitmap to identify the levels that > are populated and scans up. If it finds a workspace that is greater than > it, it uses it, but does not update the last_used time and the > corresponding place in the LRU. This provides a mechanism to decrease > memory utilization as we only keep around workspaces that are sized > appropriately for the in use compression levels. > > By knowing which compression levels have available workspaces, we can > recycle rather than always create new workspaces as well as take > advantage of the preallocated max level for forward progress. If we hit > memory pressure, we sleep on the max level workspace. We continue to > rescan in case we can use a smaller workspace, but eventually should be > able to obtain the max level workspace or allocate one again should > memory pressure subside. The memory requirement for decompression is the > same as level 1, and therefore can use any of available workspace. > > The number of workspaces is bound by an upper limit of the workqueue's > limit which currently is 2 (percpu limit). Second, a reclaim timer is > used to free inactive/improperly sized workspaces. The reclaim timer is > set to 67s to avoid colliding with transaction commit (every 30s) and > attempts to reclaim any unused workspace older than 45s. > > Repeating the experiment from v2 [1], the Silesia corpus was copied to a > btrfs filesystem 10 times and then read back after dropping the caches. > The btrfs filesystem was on an SSD. > > Level Ratio Compression (MB/s) Decompression (MB/s) > 1 2.658 438.47 910.51 > 2 2.744 364.86 886.55 > 3 2.801 336.33 828.41 > 4 2.858 286.71 886.55 > 5 2.916 212.77 556.84 > 6 2.363 119.82 990.85 > 7 3.000 154.06 849.30 > 8 3.011 159.54 875.03 > 9 3.025 100.51 940.15 > 10 3.033 118.97 616.26 > 11 3.036 94.19 802.11 > 12 3.037 73.45 931.49 > 13 3.041 55.17 835.26 > 14 3.087 44.70 716.78 > 15 3.126 37.30 878.84 > > [1] https://lore.kernel.org/linux-btrfs/20181031181108.289340-1-terrelln@xxxxxx/ > > Signed-off-by: Dennis Zhou <dennis@xxxxxxxxxx> > Cc: Nick Terrell <terrelln@xxxxxx> > Cc: Omar Sandoval <osandov@xxxxxx> > --- > fs/btrfs/super.c | 6 +- > fs/btrfs/zstd.c | 229 +++++++++++++++++++++++++++++++++++++++++++++-- > 2 files changed, 226 insertions(+), 9 deletions(-) > > diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c > index b28dff207383..0ecc513cb56c 100644 > --- a/fs/btrfs/super.c > +++ b/fs/btrfs/super.c > @@ -544,9 +544,13 @@ int btrfs_parse_options(struct btrfs_fs_info *info, char *options, > btrfs_clear_opt(info->mount_opt, NODATASUM); > btrfs_set_fs_incompat(info, COMPRESS_LZO); > no_compress = 0; > - } else if (strcmp(args[0].from, "zstd") == 0) { > + } else if (strncmp(args[0].from, "zstd", 4) == 0) { > compress_type = "zstd"; > info->compress_type = BTRFS_COMPRESS_ZSTD; > + info->compress_level = > + btrfs_compress_str2level( > + BTRFS_COMPRESS_ZSTD, > + args[0].from + 4); > btrfs_set_opt(info->mount_opt, COMPRESS); > btrfs_clear_opt(info->mount_opt, NODATACOW); > btrfs_clear_opt(info->mount_opt, NODATASUM); > diff --git a/fs/btrfs/zstd.c b/fs/btrfs/zstd.c > index a951d4fe77f7..ce9b466c197f 100644 > --- a/fs/btrfs/zstd.c > +++ b/fs/btrfs/zstd.c > @@ -6,20 +6,27 @@ > */ > > #include <linux/bio.h> > +#include <linux/bitmap.h> > #include <linux/err.h> > #include <linux/init.h> > #include <linux/kernel.h> > #include <linux/mm.h> > +#include <linux/sched/mm.h> > #include <linux/pagemap.h> > #include <linux/refcount.h> > #include <linux/sched.h> > #include <linux/slab.h> > #include <linux/zstd.h> > #include "compression.h" > +#include "ctree.h" > > #define ZSTD_BTRFS_MAX_WINDOWLOG 17 > #define ZSTD_BTRFS_MAX_INPUT (1 << ZSTD_BTRFS_MAX_WINDOWLOG) > #define ZSTD_BTRFS_DEFAULT_LEVEL 3 > +#define ZSTD_BTRFS_MAX_LEVEL 15 > +#define ZSTD_BTRFS_RECLAIM_NS (45 * NSEC_PER_SEC) > +/* 67s to avoid clashing with transaction commit (every 30s) */ > +#define ZSTD_BTRFS_RECLAIM_JIFFIES (67 * HZ) This is valid provided that transaction commit time is not overriden by Opt_commit_interval. If it is such a problem to not clash with trans commit maybe this should be calculated upon mount? <snip>
