On Thu, Feb 20, 2020 at 2:30 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote:
>
> In my stress testing we were still seeing some hole's with my patches to
> fix missing hole extents. Turns out we do not fill in holes during hole
> punch if the punch is past i_size. I incorrectly assumed this was fine,
> because anybody extending would use btrfs_cont_expand, however there is
> a corner that still can give us trouble. Start with an empty file and
>
> fallocate KEEP_SIZE 1m-2m
>
> We now have a 0 length file, and a hole file extent from 0-1m, and a
> prealloc extent from 1m-2m. Now
>
> punch 1m-1.5m
>
> Because this is past i_size we have
>
> [HOLE EXTENT][ NOTHING ][PREALLOC]
> [0 1m][1m 1.5m][1.5m 2m]
>
> with an i_size of 0. Now if we pwrite 0-1.5m we'll increas our i_size
> to 1.5m, but our disk_i_size is still 0 until the ordered extent
> completes.
>
> However if we now immediately truncate 2m on the file we'll just call
> btrfs_cont_expand(inode, 1.5m, 2m), since our old i_size is 1.5m. If we
> commit the transaction here and crash we'll expose the gap.
>
> To fix this we need to clear the file extent mapping for the range that
> we punched but didn't insert a corresponding file extent for. This will
> mean the truncate will only get an disk_i_size set to 1m if we crash
> before the finish ordered io happens.
>
> I've written an xfstest to reproduce the problem and validate this fix.
>
> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
I hit another case, besides the one I reported last week, in my
automated tests during one of these evenings.
This is probably it, but I haven't tried to debug it yet - reusing the
same fsstress seed didn't trigger it, as the test used 20 fsstress
processes (with dm log writes).
Looks good,
Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx>
> ---
> - Dave, this needs to be folded into "btrfs: use the file extent tree
> infrastructure" and the changelog needs to be adjusted since I incorrectly
> point out that we don't need to clear for hole punch. We definitely need to
> clear for the case that we're punching past i_size as we aren't inserting hole
> file extents.
>
> fs/btrfs/file.c | 27 +++++++++++++++++++++++++++
> 1 file changed, 27 insertions(+)
>
> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
> index 5cdcdbdd908b..6f6f1805e6fd 100644
> --- a/fs/btrfs/file.c
> +++ b/fs/btrfs/file.c
> @@ -2597,6 +2597,24 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
> btrfs_abort_transaction(trans, ret);
> break;
> }
> + } else if (!clone_info && cur_offset < drop_end) {
> + /*
> + * We are past the i_size here, but since we didn't
> + * insert holes we need to clear the mapped area so we
> + * know to not set disk_i_size in this area until a new
> + * file extent is inserted here.
> + */
> + ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> + cur_offset, drop_end - cur_offset);
> + if (ret) {
> + /*
> + * We couldn't clear our area, so we could
> + * presumably adjust up and corrupt the fs, so
> + * we need to abort.
> + */
> + btrfs_abort_transaction(trans, ret);
> + break;
> + }
> }
>
> if (clone_info && drop_end > clone_info->file_offset) {
> @@ -2687,6 +2705,15 @@ int btrfs_punch_hole_range(struct inode *inode, struct btrfs_path *path,
> btrfs_abort_transaction(trans, ret);
> goto out_trans;
> }
> + } else if (!clone_info && cur_offset < drop_end) {
> + /* See the comment in the loop above for the reasoning here. */
> + ret = btrfs_inode_clear_file_extent_range(BTRFS_I(inode),
> + cur_offset, drop_end - cur_offset);
> + if (ret) {
> + btrfs_abort_transaction(trans, ret);
> + goto out_trans;
> + }
> +
> }
> if (clone_info) {
> ret = btrfs_insert_clone_extent(trans, inode, path, clone_info,
> --
> 2.24.1
>
--
Filipe David Manana,
“Whether you think you can, or you think you can't — you're right.”