On Sat, May 30, 2015 at 9:59 AM, Omar Sandoval <osandov@xxxxxxxxxxx> wrote:
> Since commit bafc9b754f75 ("vfs: More precise tests in d_invalidate"),
> mounted subvolumes can be deleted because d_invalidate() won't fail.
> However, we run into problems when we attempt to delete the default
> subvolume while it is mounted as the root filesystem:
>
> # btrfs subvol list /
> ID 257 gen 306 top level 5 path rootvol
> ID 267 gen 334 top level 5 path snap1
> # btrfs subvol get-default /
> ID 267 gen 334 top level 5 path snap1
> # btrfs inspect-internal rootid /
> 267
> # mount -o subvol=/ /dev/vda1 /mnt
> # btrfs subvol del /mnt/snap1
> Delete subvolume (no-commit): '/mnt/snap1'
> ERROR: cannot delete '/mnt/snap1' - Operation not permitted
> # findmnt /
> findmnt: can't read /proc/mounts: No such file or directory
> # ls /proc
> #
>
> Markus reported that this same scenario simply led to a kernel oops.
>
> This happens because in btrfs_ioctl_snap_destroy(), we call
> d_invalidate() before we check may_destroy_subvol(), which means that we
> detach the submounts and drop the dentry before erroring out. Instead,
> we should only invalidate the dentry once we know that we're going
> through with the deletion.
>
> Cc: <stable@xxxxxxxxxxxxxxx>
> Fixes: bafc9b754f75 ("vfs: More precise tests in d_invalidate")
> Reported-by: Markus Schauler <mschauler@xxxxxxxxx>
> Signed-off-by: Omar Sandoval <osandov@xxxxxxxxxxx>
> ---
> The other fix for preventing all mounted subvolumes from being deleted
> would preclude this, but it sounded like we were leaning towards
> enforcing that in userspace once subvolume info becomes available in
> /proc/mounts, so this should be fixed separately.
>
> fs/btrfs/ioctl.c | 4 ++--
> 1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
> index 1c22c6518504..8edb8544088b 100644
> --- a/fs/btrfs/ioctl.c
> +++ b/fs/btrfs/ioctl.c
> @@ -2413,14 +2413,14 @@ static noinline int btrfs_ioctl_snap_destroy(struct file *file,
> goto out_unlock_inode;
> }
>
> - d_invalidate(dentry);
> -
> down_write(&root->fs_info->subvol_sem);
>
> err = may_destroy_subvol(dest);
> if (err)
> goto out_up_write;
>
> + d_invalidate(dentry);
> +
Any reason why not calling d_invalidate() only if the call
btrfs_unlink_subvol() succeeds? Not seeing a reason why we should
invalidate before doing the actual deletion successfully (before that
metadata reservation can fail or failure to start/join a transaction,
etc).
Also, would you consider making an xfstest for this?
thanks
> btrfs_init_block_rsv(&block_rsv, BTRFS_BLOCK_RSV_TEMP);
> /*
> * One for dir inode, two for dir entries, two for root
> --
> 2.4.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
Filipe David Manana,
"Reasonable men adapt themselves to the world.
Unreasonable men adapt the world to themselves.
That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html