Re: [PATCH] Btrfs: update commit root on snapshot creation after orphan cleanup

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Filipe,
Thank you for the explanation.
I understand that without your patch we return to user-space after
deleting the orphans, but leaving the transaction open. So user-space
sees the snapshot and can start send. With your patch, we return to
user-space after orphan cleanup has been committed. Unless we crash in
the middle, like you pointed.

I will also look at the new patch.

Thanks!
Alex.




On Thu, Jul 31, 2014 at 3:41 PM, Filipe David Manana <fdmanana@xxxxxxxxx> wrote:
> On Mon, Jul 28, 2014 at 6:31 PM, Filipe David Manana <fdmanana@xxxxxxxxx> wrote:
>> On Sat, Jul 19, 2014 at 8:11 PM, Alex Lyakas
>> <alex.btrfs@xxxxxxxxxxxxxxxxx> wrote:
>>> Hi Filipe,
>>> It's quite possible I don't fully understand the issue. It seems that
>>> we are creating a read-only snapshot, commit a transaction, and then
>>> go and modify the snapshot once again, by deleting all the
>>> ORPHAN_ITEMs we have in its file tree (btrfs_orphan_cleanup).
>>> Shouldn't all this be part of snapshot creation, so that after we
>>> commit, we have a clean file tree with no orphans there? (not sure if
>>> this makes sense though).
>>>
>>> With your patch we do this additional commit after the cleanup. But
>>> nothing prevents "send" from starting before this additional commit,
>>> correct? And it would still see the orphans through the commit root.
>>> You say that it is not a problem, but I am not sure why (probably I am
>>> missing something here). So for me it looks like your patch closes a
>>> race window significantly (at the cost of an additional commit), but
>>> does not close it fully.
>>
>> Hi Alex,
>>
>> That's right, after the transaction commit finishes, the snapshot will
>> be visible and accessible to user space - so someone may start a send
>> before the orphan cleanup starts. It was ok only for the serialized
>> case (create snapshot, wait for ioctl to return, call send ioctl).
>
> Actually no. If after the 1st transaction commit (the one that creates
> the snapshot and makes it visible to user space) and before the orphan
> cleanup is called another task attempts to use the snapshot for a send
> operation, it will block when doing the snapshot dentry lookup -
> because both tasks acquire the parent inode's mutex (implicitly
> through the vfs and explicitly via the snapshot/subvol ioctl entry
> point).
>
> Nevertheless, it's better to move the commit root switch part to the
> dentry lookup function (as the new patch does), since after the first
> transaction commit and before the second one commits, a reboot might
> happen, and after that we would get into the same issue until the
> first transaction commit happens after the reboot. I'll update the new
> patch's comment.
>
> thanks
>
>>
>>>
>>> But most important: perhaps "send" should look for ORPHAN_ITEMs and
>>> treat those inodes as "deleted"?
>>
>> There are other cases were orphans can exist, like for file truncates
>> for example, where ignoring the inode wouldn't be very correct.
>> Tried that approach initially, but it's actually more complex to
>> implement and adds some additional overhead (tree searches - and the
>> orphan items are normally too far from the inode items, due to a very
>> high objectid (-5ULL)).
>>
>> I've reworked this with a different approach and CC'ed you
>> (https://patchwork.kernel.org/patch/4635471/).
>>
>> thanks
>>
>>>
>>> Thanks,
>>> Alex.
>>>
>>>
>>>
>>> On Tue, Jun 3, 2014 at 2:41 PM, Filipe David Borba Manana
>>> <fdmanana@xxxxxxxxx> wrote:
>>>> On snapshot creation (either writable or read-only), we do orphan cleanup
>>>> against the root of the snapshot. If the cleanup did remove any orphans,
>>>> then the current root node will be different from the commit root node
>>>> until the next transaction commit happens.
>>>>
>>>> A send operation always uses the commit root of a snapshot - this means
>>>> it will see the orphans if it starts computing the send stream before the
>>>> next transaction commit happens (triggered by a timer or sync() for .e.g),
>>>> which is when the commit root gets assigned a reference to current root,
>>>> where the orphans are not visible anymore. The consequence of send seeing
>>>> the orphans is explained below.
>>>>
>>>> For example:
>>>>
>>>>     mkfs.btrfs -f /dev/sdd
>>>>     mount -o commit=999 /dev/sdd /mnt
>>>>
>>>>     # open a file with O_TMPFILE and leave it open
>>>>     # write some data to the file
>>>>     btrfs subvolume snapshot -r /mnt /mnt/snap1
>>>>
>>>>     btrfs send /mnt/snap1 -f /tmp/send.data
>>>>
>>>> The send operation will fail with the following error:
>>>>
>>>>     ERROR: send ioctl failed with -116: Stale file handle
>>>>
>>>> What happens here is that our snapshot has an orphan inode still visible
>>>> through the commit root, that corresponds to the tmpfile. However send
>>>> will attempt to call inode.c:btrfs_iget(), with the goal of reading the
>>>> file's data, which will return -ESTALE because it will use the current
>>>> root (and not the commit root) of the snapshot.
>>>>
>>>> Of course, there are other cases where we can get orphans, but this
>>>> example using a tmpfile makes it much easier to reproduce the issue.
>>>>
>>>> Therefore on snapshot creation, after calling btrfs_orphan_cleanup, if
>>>> the commit root is different from the current root, just commit the
>>>> transaction associated with the snapshot's root (if it exists), so that
>>>> a send will not see any orphans that don't exist anymore. This also
>>>> guarantees a send will always see the same content regardless of whether
>>>> a transaction commit happened already before the send was requested and
>>>> after the orphan cleanup (meaning the commit root and current roots are
>>>> the same) or it hasn't happened yet (commit and current roots are
>>>> different).
>>>>
>>>> Signed-off-by: Filipe David Borba Manana <fdmanana@xxxxxxxxx>
>>>> ---
>>>>  fs/btrfs/ioctl.c | 29 +++++++++++++++++++++++++++++
>>>>  1 file changed, 29 insertions(+)
>>>>
>>>> diff --git a/fs/btrfs/ioctl.c b/fs/btrfs/ioctl.c
>>>> index 95194a9..6680ad9 100644
>>>> --- a/fs/btrfs/ioctl.c
>>>> +++ b/fs/btrfs/ioctl.c
>>>> @@ -712,6 +712,35 @@ static int create_snapshot(struct btrfs_root *root, struct inode *dir,
>>>>         if (ret)
>>>>                 goto fail;
>>>>
>>>> +       /*
>>>> +        * If orphan cleanup did remove any orphans, it means the tree was
>>>> +        * modified and therefore the commit root is not the same as the
>>>> +        * current root anymore. This is a problem, because send uses the
>>>> +        * commit root and therefore can see inode items that don't exist
>>>> +        * in the current root anymore, and for example make calls to
>>>> +        * btrfs_iget, which will do tree lookups based on the current root
>>>> +        * and not on the commit root. Those lookups will fail, returning a
>>>> +        * -ESTALE error, and making send fail with that error. So make sure
>>>> +        * a send does not see any orphans we have just removed, and that it
>>>> +        * will see the same inodes regardless of whether a transaction
>>>> +        * commit happened before it started (meaning that the commit root
>>>> +        * will be the same as the current root) or not.
>>>> +        */
>>>> +       if (readonly && pending_snapshot->snap->node !=
>>>> +           pending_snapshot->snap->commit_root) {
>>>> +               trans = btrfs_join_transaction(pending_snapshot->snap);
>>>> +               if (IS_ERR(trans) && PTR_ERR(trans) != -ENOENT) {
>>>> +                       ret = PTR_ERR(trans);
>>>> +                       goto fail;
>>>> +               }
>>>> +               if (!IS_ERR(trans)) {
>>>> +                       ret = btrfs_commit_transaction(trans,
>>>> +                                                      pending_snapshot->snap);
>>>> +                       if (ret)
>>>> +                               goto fail;
>>>> +               }
>>>> +       }
>>>> +
>>>>         inode = btrfs_lookup_dentry(dentry->d_parent->d_inode, dentry);
>>>>         if (IS_ERR(inode)) {
>>>>                 ret = PTR_ERR(inode);
>>>> --
>>>> 1.9.1
>>>>
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>>  Unreasonable men adapt the world to themselves.
>>  That's why all progress depends on unreasonable men."
>
>
>
> --
> Filipe David Manana,
>
> "Reasonable men adapt themselves to the world.
>  Unreasonable men adapt the world to themselves.
>  That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux