On Mon, Dec 16, 2019 at 6:28 PM <fdmanana@xxxxxxxxxx> wrote:
>
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> We always round down, to a multiple of the filesystem's block size, the
> length to deduplicate at generic_remap_check_len(). However this is only
> needed if an attempt to deduplicate the last block into the middle of the
> destination file is requested, since that leads into a corruption if the
> length of the source file is not block size aligned. When an attempt to
> deduplicate the last block into the end of the destination file is
> requested, we should allow it because it is safe to do it - there's no
> stale data exposure and we are prepared to compare the data ranges for
> a length not aligned to the block (or page) size - in fact we even do
> the data compare before adjusting the deduplication length.
>
> After btrfs was updated to use the generic helpers from VFS (by commit
> 34a28e3d77535e ("Btrfs: use generic_remap_file_range_prep() for cloning
> and deduplication")) we started to have user reports of deduplication
> not reflinking the last block anymore, and whence users getting lower
> deduplication scores. The main use case is deduplication of entire
> files that have a size not aligned to the block size of the filesystem.
>
> We already allow cloning the last block to the end (and beyond) of the
> destination file, so allow for deduplication as well.
>
> Link: https://lore.kernel.org/linux-btrfs/2019-1576167349.500456@xxxxxxxxxxxxxx/
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
Darrick, Al, any feedback?
Thanks.
> ---
> fs/read_write.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
>
> diff --git a/fs/read_write.c b/fs/read_write.c
> index 5bbf587f5bc1..7458fccc59e1 100644
> --- a/fs/read_write.c
> +++ b/fs/read_write.c
> @@ -1777,10 +1777,9 @@ static int remap_verify_area(struct file *file, loff_t pos, loff_t len,
> * else. Assume that the offsets have already been checked for block
> * alignment.
> *
> - * For deduplication we always scale down to the previous block because we
> - * can't meaningfully compare post-EOF contents.
> - *
> - * For clone we only link a partial EOF block above the destination file's EOF.
> + * For clone we only link a partial EOF block above or at the destination file's
> + * EOF. For deduplication we accept a partial EOF block only if it ends at the
> + * destination file's EOF (can not link it into the middle of a file).
> *
> * Shorten the request if possible.
> */
> @@ -1796,8 +1795,7 @@ static int generic_remap_check_len(struct inode *inode_in,
> if ((*len & blkmask) == 0)
> return 0;
>
> - if ((remap_flags & REMAP_FILE_DEDUP) ||
> - pos_out + *len < i_size_read(inode_out))
> + if (pos_out + *len < i_size_read(inode_out))
> new_len &= ~blkmask;
>
> if (new_len == *len)
> --
> 2.11.0
>