Re: [PATCH 5/5] Btrfs: incremental send, fix rmdir not send utimes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jun 18, 2015 at 4:21 AM, Robbie Ko <robbieko@xxxxxxxxxxxx> wrote:
> Hi Filipe,
>
> I've found that the following case is the main cause of such error
> and it's fs tree is shown via btrfs-debug-tress as below.
>
> file tree key (459 ROOT_ITEM 20487)
> node 132988928 level 1 items 3 free 490 generation 20487 owner 459
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>         key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486
>         key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480
>         key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464
> leaf 132710400 items 166 free space 3639 generation 20486 owner 455
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>         item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
>                 inode generation 20425 transid 20442 size 32 block
> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
>         item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
>                 inode ref index 0 namelen 2 name: ..
> ...
>         item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39
>                 location key (0 UNKNOWN.0 0) type XATTR
>                 namelen 8 datalen 1 name: user.a78
>                 data a
>                 binary 61
> leaf 130695168 items 133 free space 7332 generation 20480 owner 455
> fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
> chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
>         item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160
>                 inode generation 20428 transid 20434 size 10 block
> group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
>         item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11
>                 inode ref index 11 namelen 1 name: c
> ...
>
> We can see that inode 262 is right at the end of leaf. Then send_utime() will
> use btrfs_search_slot() to find a appropriate place to put 262 where is at the
>  back of 262. However, that place is uninitialized on disk.
> Suppose we read
> atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out.
> Receiving side will  got EINVAL since tv_nsec:1919251317 is greater
> than   999,999,999.

I see.
So in apply_dir_move, instead of searching the btree of the send
snapshot, we can search the rbtree of orphan dir infos for an entry
with a key == cur->dir. Searching that rbtree makes it clear what the
intention is and more efficient (fully in memory structure, and much
smaller than the btree). Should work, but I haven't tested it.

thanks

>
> Thanks.
> Robbie Ko
>
> 2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@xxxxxxxxxxxx>:
>> Hi Filipi,
>>
>> 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@xxxxxxxxx>:
>>> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@xxxxxxxxxxxx> wrote:
>>>> Hi Filipe,
>>>>
>>>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@xxxxxxxxx>:
>>>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@xxxxxxxxxxxx> wrote:
>>>>>> Hi Filipe,
>>>>>
>>>>> Hi Robbie,
>>>>>
>>>>>>
>>>>>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>>>>>
>>>>>>  In apply_dir_move(), the old parent dir. and new parent dir. will be
>>>>>> updated after the current dir. has moved.
>>>>>>
>>>>>> And there's only one entry in old parent dir. (e.g. entry with
>>>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>>>>>> is deleted but updated.
>>>>>
>>>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
>>>>> rmdir_ino corresponds to the number of a inode that wasn't deleted
>>>>> when it was processed because there was some inode with a lower number
>>>>> that is a child of the directory in the parent snapshot and had its
>>>>> rename/move operation delayed (it happens after the directory we want
>>>>> to delete is processed).
>>>>>
>>>>
>>>> Right , my "tagged with rmdir_ino" is same meaning as you explained here.
>>>>
>>>>>>
>>>>>> However, if we process rename for another entry not tagged with
>>>>>> rmdir_ino first, its old parent dir. which is deleted  will be updated
>>>>>> according to apply_dir_move().
>>>>>>
>>>>>> Therefore, I think we should check the existence of  the dir. before
>>>>>> we're going to update it's utime.
>>>>>>
>>>>>> The patch is pasted in the following link, could you give me some comment?
>>>>>>
>>>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>>>>>
>>>>> Looks better.
>>>>> However I still don't understand your explanation, and just tried the
>>>>> example in your commit message:
>>>>>
>>>>> "Parent snapshot:
>>>>>
>>>>> |---- a/ (ino 259)
>>>>>   |---- c (ino 264)
>>>>> |---- b/ (ino 260)
>>>>>   |---- d (ino 265)
>>>>> |---- del/ (ino 263)
>>>>>   |---- item1/ (ino 261)
>>>>>   |---- item2/ (ino 262)
>>>>>
>>>>> Send snapshot:
>>>>> |---- a/ (ino 259)
>>>>> |---- b/ (ino 260)
>>>>> |---- c/ (ino 2)
>>>>>   |---- item2 (ino 259)
>>>>> |---- d/ (ino 257)
>>>>>   |---- item1/ (ino 258)"
>>>>>
>>>>> So it's confusing after looking at it.
>>>>> First the send snapshot mentions inode number 2, which doesn't exist
>>>>> in the parent snapshot - I assume you meant inode number 264.
>>>>> Then, the send snapshot has two inodes with number 259. Is "item2" in
>>>>> the send snapshot supposed to be inode 262?
>>>>>
>>>>
>>>> Your guess is right. And I correct it as follow.
>>>>
>>>>  # Parent snapshot:
>>>>  #
>>>>  # |---- a/            (ino 259)
>>>>  # |     |---- c       (ino 264)
>>>>  # |
>>>>  # |---- b/            (ino 260)
>>>>  # |     |---- d       (ino 265)
>>>>  # |
>>>>  # |---- del/          (ino 263)
>>>>  #        |---- item1/ (ino 261)
>>>>  #        |---- item2/ (ino 262)
>>>>
>>>>  # Send snapshot:
>>>>  #
>>>>  # |---- a/            (ino 259)
>>>>  # |---- b/            (ino 260)
>>>>  # |---- c/            (ino 264)
>>>>  # |     |---- item2/  (ino 262)
>>>>  # |
>>>>  # |---- d/            (ino 265)
>>>>  #       |---- item1/  (ino 261)
>>>>
>>>>> Anyway, assuming those 2 fixes to the example are correct guesses, I
>>>>> tried the following and it didn't fail without your patches (i.e. no
>>>>> attempts to send utimes to a non-existing directory):
>>>>>
>>>>
>>>> Here my mean is :  btrfs tries to get utime from non-existing directory and
>>>> apply it on the existing directory. And my patch is attempted to avoid
>>>> this case.
>>>> However, this case is not guaranteed to cause error anytime but it may
>>>> fails somehow
>>>> which is depending on the data on the disk.
>>>> The following are the incremental procedures to send the snapshot.
>>>>
>>>> utimes
>>>> utimes a
>>>> utimes b
>>>> rename del -> o263-259-0
>>>> utimes
>>>> rename a/c -> c
>>>> utimes
>>>> utimes a
>>>> rename o263-259-0/item2 -> c/item2
>>>> utimes c/item2
>>>> utimes o263-259-0          <<---------------------- this step may cause error
>>>
>>> Why may it cause an error?
>>> At that moment the name/path o263-259-0 exists at the destination
>>> (i.e. the receiver, as it applies commands from the send stream
>>> serially).
>>
>> Following is the error message when do receive.
>> ERROR: utimes o263-259-0 failed. Invalid argument
>>
>> The argument of utimes for o263-259-0 is got from inode 263 in send
>> root, But inode 263 is not exist in send root
>> In send_utimes(), we didn't check if btrfs_search_slot returns 1,
>> therefore may encounters this problem.
>>
>> I will try to make a stable reproducer.
>>
>> Thanks.
>> Robbie Ko
>>
>>>
>>>> utimes c
>>>> utimes c
>>>> rename b/d -> d
>>>> utimes
>>>> utimes b
>>>> rename o263-259-0/item1 -> d/item1
>>>> rmdir o263-259-0
>>>> utimes d/item1
>>>> utimes d
>>>> utimes d
>>>>
>>>> As the above pointed procedure, o263-259-0 is not appeared in the send root.
>>>
>>> Well yes, but that doesn't matter.
>>> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent),
>>> they're just temporary names to allow for rename/move operations when
>>> collisions are detected, and exist only in the receiver's filesystem.
>>>
>>>> When utime got from o263-259-0 is invalid (i.e. out of range of time
>>>> format), it will fail.
>>>> I saw the error occurs at utimensat() and got EINVAL. Here's
>>>> explanation from linux man page.
>>>>
>>>> EINVAL Invalid value in one of the tv_nsec fields (value outside
>>>>               range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
>>>>               an invalid value in one of the tv_sec fields.
>>>>
>>>> So if o263-259-0 is not the send root, invalid format of utime may be got.
>>>
>>> Doubly confused now. So the whole change log (and the code changes)
>>> mentions only attempts to send utimes for a directory/path that
>>> doesn't exist in the receiver's fs - inode 263 exists in both the send
>>> and parent roots, so we can always get its utimes values. Now you're
>>> saying that somewhere in the send code we're getting incorrect values
>>> for a utimes operation and then sending such operation to the
>>> receiver?
>>>
>>> Getting the values for utimes is done looking up the inode item, by
>>> its number, in the send root - for this the current name of the inode
>>> in the receiver fs doesn't matter - it matters only for building the
>>> path for the utimes operation. But what I'm understanding from your
>>> reply is that we're getting a wrong utimes value form the inode item
>>> for inode number 263, which would be a totally different issue from
>>> generating incorrect paths.
>>>
>>>
>>> So this still doesn't explain me why, without any of your patches
>>> applied, the reproducer doesn't fail - which is really an important
>>> aspect.
>>> This is the xfstest I made with your reproducer:
>>> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and
>>> tests/btrfs/999.out is just contains the single line "QA output
>>> created by 999").
>>>
>>> If you're able to give me a correct reproducer, I can make sense of it
>>> and help getting a better change log.
>>> Or this issue happens only after applying some of the other patches in
>>> the series?
>>>
>>> thanks
>>>
>>>
>>>
>>>
>>>>
>>>> Thanks.
>>>> Robbie Ko
>>>>
>>>>> # Parent snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |     |---- c       (ino 264)
>>>>> # |
>>>>> # |---- b/            (ino 260)
>>>>> # |     |---- d       (ino 265)
>>>>> # |
>>>>> # |---- del/          (ino 263)
>>>>> #        |---- item1/ (ino 261)
>>>>> #        |---- item2/ (ino 262)
>>>>>
>>>>> # Send snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |---- b/            (ino 260)
>>>>> # |---- c/            (ino 264)
>>>>> # |     |---- item2/  (ino 262)
>>>>> # |
>>>>> # |---- d/            (ino 265)
>>>>> #       |---- item1/  (ino 258)
>>>>>
>>>>> mkdir $SCRATCH_MNT/0
>>>>> mkdir $SCRATCH_MNT/1
>>>>>
>>>>> mkdir $SCRATCH_MNT/a # 259
>>>>> mkdir $SCRATCH_MNT/b # 260
>>>>> mkdir $SCRATCH_MNT/item1 # 261
>>>>> mkdir $SCRATCH_MNT/item2 # 262
>>>>> mkdir $SCRATCH_MNT/del # 263
>>>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
>>>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
>>>>> mkdir $SCRATCH_MNT/a/c # 264
>>>>> mkdir $SCRATCH_MNT/b/d # 265
>>>>>
>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>>>>>
>>>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
>>>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
>>>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
>>>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
>>>>> rmdir $SCRATCH_MNT/del
>>>>>
>>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>>>>>
>>>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
>>>>> $SCRATCH_MNT/mysnap2
>>>>>
>>>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
>>>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
>>>>> -f $tmp/2.snap
>>>>>
>>>>> _check_scratch_fs
>>>>>
>>>>> _scratch_unmount
>>>>> _scratch_mkfs >/dev/null 2>&1
>>>>> _scratch_mount
>>>>>
>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
>>>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>>
>>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
>>>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>>>>>
>>>>>
>>>>>
>>>>> I would suggest making those hiearachy diagrams more readable - pipes
>>>>> right below the name of their parent, continuation pipes like and
>>>>> align all inode numbers in the same column, like the following:
>>>>>
>>>>> # Parent snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |     |---- c       (ino 264)
>>>>> # |
>>>>> # |---- b/            (ino 260)
>>>>> # |     |---- d       (ino 265)
>>>>> # |
>>>>> # |---- del/          (ino 263)
>>>>> #        |---- item1/ (ino 261)
>>>>> #        |---- item2/ (ino 262)
>>>>>
>>>>> # Send snapshot:
>>>>> #
>>>>> # |---- a/            (ino 259)
>>>>> # |---- b/            (ino 260)
>>>>> # |---- c/            (ino 264)
>>>>> # |     |---- item2/  (ino 262)
>>>>> # |
>>>>> # |---- d/            (ino 265)
>>>>> #       |---- item1/  (ino 258)
>>>>>
>>>>> (pasted here in case gmail screws up the indentation/formatting:
>>>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>>>>>
>>>>> thanks
>>>>>
>>>>>>
>>>>>> Thans!
>>>>>>
>>>>>> Robbie Ko
>>>>>>
>>>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@xxxxxxxxx>:
>>>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@xxxxxxxxxxxx> wrote:
>>>>>>>> There's one case where we can't issue a utimes operation for a directory.
>>>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>>>>>> processed and update uime between two parent directory.
>>>>>>>> So fix this by not update non exist utimes for this case.
>>>>>>>
>>>>>>> So you mean that we attempt to update utimes for an inode,
>>>>>>> corresponding to a directory, that exists in the parent snapshot but
>>>>>>> not in the send snapshot.
>>>>>>>
>>>>>>> So the subject should be something like "Btrfs: incremental send,
>>>>>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>>>>>> incremental send, fix rmdir not send utimes"
>>>>>>>
>>>>>>>>
>>>>>>>> Example:
>>>>>>>>
>>>>>>>> Parent snapshot:
>>>>>>>> |---- a/ (ino 259)
>>>>>>>>   |---- c (ino 264)
>>>>>>>> |---- b/ (ino 260)
>>>>>>>>   |---- d (ino 265)
>>>>>>>> |---- del/ (ino 263)
>>>>>>>>   |---- item1/ (ino 261)
>>>>>>>>   |---- item2/ (ino 262)
>>>>>>>>
>>>>>>>> Send snapshot:
>>>>>>>> |---- a/ (ino 259)
>>>>>>>> |---- b/ (ino 260)
>>>>>>>> |---- c/ (ino 2)
>>>>>>>>   |---- item2 (ino 259)
>>>>>>>> |---- d/ (ino 257)
>>>>>>>>   |---- item1/ (ino 258)
>>>>>>>>
>>>>>>>> Signed-off-by: Robbie Ko <robbieko@xxxxxxxxxxxx>
>>>>>>>> ---
>>>>>>>>  fs/btrfs/send.c | 2 +-
>>>>>>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>>>>> index e8eb3ab..46f954c 100644
>>>>>>>> --- a/fs/btrfs/send.c
>>>>>>>> +++ b/fs/btrfs/send.c
>>>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>>>>>         key.type = BTRFS_INODE_ITEM_KEY;
>>>>>>>>         key.offset = 0;
>>>>>>>>         ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>>>>>> -       if (ret < 0)
>>>>>>>> +       if (ret != 0)
>>>>>>>>                 goto out;
>>>>>>>
>>>>>>> So I don't think this is a good fix. The problem is in some code that
>>>>>>> calls this function (send_utimes) against the directory that doesn't
>>>>>>> exist - it just shouldn't do that, its logic should be fixed.
>>>>>>> Following this approach, while it works, it's just hiding logic errors
>>>>>>> in one or more code paths, and none of its callers checks for a return
>>>>>>> value of 1 - they only react to values < 0 and introduces the
>>>>>>> possibility of propagating a return value of 1 to user space.
>>>>>>>
>>>>>>> thanks
>>>>>>>
>>>>>>>>
>>>>>>>>         eb = path->nodes[0];
>>>>>>>> --
>>>>>>>> 1.9.1
>>>>>>>>
>>>>>>>> --
>>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Filipe David Manana,
>>>>>>>
>>>>>>> "Reasonable men adapt themselves to the world.
>>>>>>>  Unreasonable men adapt the world to themselves.
>>>>>>>  That's why all progress depends on unreasonable men."
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Filipe David Manana,
>>>>>
>>>>> "Reasonable men adapt themselves to the world.
>>>>>  Unreasonable men adapt the world to themselves.
>>>>>  That's why all progress depends on unreasonable men."
>>>
>>>
>>>
>>> --
>>> Filipe David Manana,
>>>
>>> "Reasonable men adapt themselves to the world.
>>>  Unreasonable men adapt the world to themselves.
>>>  That's why all progress depends on unreasonable men."



-- 
Filipe David Manana,

"Reasonable men adapt themselves to the world.
 Unreasonable men adapt the world to themselves.
 That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux