Re: [PATCH v2] btrfs: add zstd compression level support

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Nov 13, 2018 at 01:33:32AM +0100, David Sterba wrote:
> On Wed, Oct 31, 2018 at 11:11:08AM -0700, Nick Terrell wrote:
> > From: Jennifer Liu <jenniferliu620@xxxxxx>
> > 
> > Adds zstd compression level support to btrfs. Zstd requires
> > different amounts of memory for each level, so the design had
> > to be modified to allow set_level() to allocate memory. We
> > preallocate one workspace of the maximum size to guarantee
> > forward progress. This feature is expected to be useful for
> > read-mostly filesystems, or when creating images.
> > 
> > Benchmarks run in qemu on Intel x86 with a single core.
> > The benchmark measures the time to copy the Silesia corpus [0] to
> > a btrfs filesystem 10 times, then read it back.
> > 
> > The two important things to note are:
> > - The decompression speed and memory remains constant.
> >   The memory required to decompress is the same as level 1.
> > - The compression speed and ratio will vary based on the source.
> > 
> > Level	Ratio	Compression	Decompression	Compression Memory
> > 1    	2.59 	153 MB/s   	112 MB/s     	0.8 MB
> > 2    	2.67 	136 MB/s   	113 MB/s     	1.0 MB
> > 3    	2.72 	106 MB/s   	115 MB/s     	1.3 MB
> > 4    	2.78 	86  MB/s   	109 MB/s     	0.9 MB
> > 5    	2.83 	69  MB/s   	109 MB/s     	1.4 MB
> > 6    	2.89 	53  MB/s   	110 MB/s     	1.5 MB
> > 7    	2.91 	40  MB/s   	112 MB/s     	1.4 MB
> > 8    	2.92 	34  MB/s   	110 MB/s     	1.8 MB
> > 9    	2.93 	27  MB/s   	109 MB/s     	1.8 MB
> > 10   	2.94 	22  MB/s   	109 MB/s     	1.8 MB
> > 11   	2.95 	17  MB/s   	114 MB/s     	1.8 MB
> > 12   	2.95 	13  MB/s   	113 MB/s     	1.8 MB
> > 13   	2.95 	10  MB/s   	111 MB/s     	2.3 MB
> > 14   	2.99 	7   MB/s   	110 MB/s     	2.6 MB
> > 15   	3.03 	6   MB/s   	110 MB/s     	2.6 MB
> > 
> > [0] http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
> > 
> > Signed-off-by: Jennifer Liu <jenniferliu620@xxxxxx>
> > Signed-off-by: Nick Terrell <terrelln@xxxxxx>
> > Reviewed-by: Omar Sandoval <osandov@xxxxxx>
> > ---
> > v1 -> v2:
> > - Don't reflow the unchanged line.
> > 

[snip]

> > -static struct list_head *zstd_alloc_workspace(void)
> > +static bool zstd_set_level(struct list_head *ws, unsigned int level)
> > +{
> > +	struct workspace *workspace = list_entry(ws, struct workspace, list);
> > +	ZSTD_parameters params;
> > +	int size;
> > +
> > +	if (level > BTRFS_ZSTD_MAX_LEVEL)
> > +		level = BTRFS_ZSTD_MAX_LEVEL;
> > +
> > +	if (level == 0)
> > +		level = BTRFS_ZSTD_DEFAULT_LEVEL;
> > +
> > +	params = ZSTD_getParams(level, ZSTD_BTRFS_MAX_INPUT, 0);
> > +	size = max_t(size_t,
> > +			ZSTD_CStreamWorkspaceBound(params.cParams),
> > +			ZSTD_DStreamWorkspaceBound(ZSTD_BTRFS_MAX_INPUT));
> > +	if (size > workspace->size) {
> > +		if (!zstd_reallocate_mem(workspace, size))
> 
> This can allocate memory and this can appen on the writeout path, ie.
> one of the reasons for that might be that system needs more memory.
> 
> By the table above, the size can be up to 2.6MiB, which is a lot in
> terms of kernel memory as there must be either contiguous unmapped
> memory, the virtual mappings must be created. Both are scarce resource
> or should be treated as such.
> 
> Given that there's no logic that would try to minimize the usage for
> workspaces, this can allocate many workspaces of that size.
> 
> Currently the workspace allocations have been moved to the early module
> loading phase so that they don't happen later and we don't have to
> allocate memory nor handle the failures. Your patch brings that back.

Even before this patch, we may try to allocate a workspace. See
__find_workspace():

https://github.com/kdave/btrfs-devel/blob/fd0f5617a8a2ee92dd461d01cf9c5c37363ccc8d/fs/btrfs/compression.c#L897

We already limit it to one per CPU, and only allocate when needed.
Anything greater than that has to wait. Maybe we should improve that to
also include a limit on the total amount of memory allocated? That would
be more flexible than your approach below of making the > default case
special, and I like it more than Timofey's idea of falling back to a
lower level.

> The solution I'm currently thinking about can make the levels work but
> would be limited in throughput as a trade-off for the memory
> consumption.
> 
> - preallocate one workspace for level 15 per mounted filesystem, using
>   get_free_pages_exact or kvmalloc
> 
> - preallocate workspaces for the default level, the number same as for
>   lzo/zlib
> 
> - add logic to select the zstd:15 workspace last for other compressors,
>   ie. make it almost exclusive for zstd levels > default
> 
> Further refinement could allocate the workspaces on-demand and free if
> not used. Allocation failures would still be handled gracefully at the
> cost of waiting for the remainig worspaces, at least one would be always
> available.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux