On Fri, Dec 07, 2018 at 03:25:38PM +0000, fdmanana@xxxxxxxxxx wrote: > From: Filipe Manana <fdmanana@xxxxxxxx> > > Since cloning and deduplication are no longer Btrfs specific operations, we > now have generic code to handle parameter validation, compare file ranges > used for deduplication, clear capabilities when cloning, etc. This change > makes Btrfs use it, eliminating a lot of code in Btrfs and also fixing a > few bugs, such as: > > 1) When cloning, the destination file's capabilities were not dropped > (the fstest generic/513 tests this); > > 2) We were not checking if the destination file is immutable; > > 3) Not checking if either the source or destination files are swap > files (swap file support is coming soon for Btrfs); > > 4) System limits were not checked (resource limits and O_LARGEFILE). > > Note that the generic helper generic_remap_file_range_prep() does start > and waits for writeback by calling filemap_write_and_wait_range(), however > that is not enough for Btrfs for two reasons: > > 1) With compression, we need to start writeback twice in order to get the > pages marked for writeback and ordered extents created; > > 2) filemap_write_and_wait_range() (and all its other variants) only waits > for the IO to complete, but we need to wait for the ordered extents to > finish, so that when we do the actual reflinking operations the file > extent items are in the fs tree. This is also important due to the fact > that the generic helper, for the deduplication case, compares the > contents of the pages in the requested range, which might require > reading extents from disk in the very unlikely case that pages get > invalidated after writeback finishes (so the file extent items must be > up to date in the fs tree). > > Since these reasons are specific to Btrfs we have to do it in the Btrfs > code before calling generic_remap_file_range_prep(). This also results in > a more simple way of dealing with existing delalloc in the source/target > ranges, specially for the deduplication case where we used to lock all the > pages first and then if we found any dealloc for the range, or ordered > extent, we would unlock the pages trigger writeback and wait for ordered > extents to complete, then lock all the pages again and check if > deduplication can be done. So now we get a simpler approach: lock the > inodes, then trigger writeback and then wait for ordered extents to > complete. > > So make btrfs use generic_remap_file_range_prep() (XFS and OCFS2 use it) > to eliminate duplicated code, fix a few bugs and benefit from future bug > fixes done there - for example the recent clone and dedupe bugs involving > reflinking a partial EOF block got a counterpart fix in the generic helpe, > since it affected all filesystems supporting these operations, so we no > longer need special checks in Btrfs for them. > > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx> > --- > > V2: Removed check that verifies if either of the inodes is a directory, > as it is done by generic_remap_file_range_prep(). Oddly in btrfs was being > done only for cloning but not for dedupe. A big patch, but I see how the logic is simplified and redundant code thrown out. I'm concerned about size here because of potential backports to older kernels, where a minimal fix separated from actual code removal would cause fewer conflicts. Reviewed-by: David Sterba <dsterba@xxxxxxxx>
