On Thu, Feb 20, 2020 at 3:30 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>
> On 2/19/20 9:06 AM, fdmanana@xxxxxxxxxx wrote:
> > From: Filipe Manana <fdmanana@xxxxxxxx>
> >
> > There are a few cases where we don't allow cloning an inline extent into
> > the destination inode, returning -EOPNOTSUPP to user space. This was done
> > to prevent several types of file corruption and because it's not very
> > straightforward to deal with these cases, as they can't rely on simply
> > copying the inline extent between leaves. Such cases require copying the
> > inline extent's data into the respective page of the destination inode.
> >
> > Not supporting these cases makes it harder and more cumbersome to write
> > applications/libraries that work on any filesystem with reflink support,
> > since all these cases for which btrfs fails with -EOPNOTSUPP work just
> > fine on xfs for example. These unsupported cases are also not documented
> > anywhere and explaining which exact cases fail require a bit of too
> > technical understanding of btrfs's internal (inline extents and when and
> > where can they exist in a file), so it's not really user friendly.
> >
> > Also some test cases from fstests that use fsx, such as generic/522 for
> > example, can sporadically fail because they trigger one of these cases,
> > and fsx expects all operations to succeed.
> >
> > This change adds supports for cloning all these cases by copying the
> > inline extent's data into the respective page of the destination inode.
> >
> > With this change test case btrfs/112 from fstests fails because it
> > expects some clone operations to fail, so it will be updated. Also a
> > new test case that exercises all these previously unsupported cases
> > will be added to fstests.
> >
> > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > ---
> > fs/btrfs/reflink.c | 212 ++++++++++++++++++++++++++++++++-------------
> > 1 file changed, 152 insertions(+), 60 deletions(-)
> >
> > diff --git a/fs/btrfs/reflink.c b/fs/btrfs/reflink.c
> > index 7e7f46116db3..c19c87de6d4a 100644
> > --- a/fs/btrfs/reflink.c
> > +++ b/fs/btrfs/reflink.c
> > @@ -1,8 +1,12 @@
> > // SPDX-License-Identifier: GPL-2.0
> >
> > #include <linux/iversion.h>
> > +#include <linux/blkdev.h>
> > #include "misc.h"
> > #include "ctree.h"
> > +#include "btrfs_inode.h"
> > +#include "compression.h"
> > +#include "delalloc-space.h"
> > #include "transaction.h"
> >
> > #define BTRFS_MAX_DEDUPE_LEN SZ_16M
> > @@ -43,30 +47,121 @@ static int clone_finish_inode_update(struct btrfs_trans_handle *trans,
> > return ret;
> > }
> >
> > +static int copy_inline_to_page(struct inode *inode,
> > + const u64 file_offset,
> > + char *inline_data,
> > + const u64 size,
> > + const u64 datal,
> > + const u8 comp_type)
> > +{
> > + const u64 block_size = btrfs_inode_sectorsize(inode);
> > + const u64 range_end = file_offset + block_size - 1;
> > + const size_t inline_size = size - btrfs_file_extent_calc_inline_size(0);
> > + char *data_start = inline_data + btrfs_file_extent_calc_inline_size(0);
> > + struct extent_changeset *data_reserved = NULL;
> > + struct page *page = NULL;
> > + bool page_locked = false;
> > + int ret;
> > +
> > + ASSERT(IS_ALIGNED(file_offset, block_size));
> > +
> > + ret = btrfs_delalloc_reserve_space(inode, &data_reserved, file_offset,
> > + block_size);
>
> This could potentially deadlock, as we could need to flush delalloc for this
> inode that we've dirtied pages for and not be able to make progress because we
> have this range locked.
But we have already flushed the range before, after locking the inode
and waiting for dio requests,
so during the reflink operation no one should be able to dirty pages
in the range. Or did I miss some edge case?
thanks
>
> Josef