Hi Filipe,
I've found that the following case is the main cause of such error
and it's fs tree is shown via btrfs-debug-tress as below.
file tree key (459 ROOT_ITEM 20487)
node 132988928 level 1 items 3 free 490 generation 20487 owner 459
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
key (256 INODE_ITEM 0) block 132710400 (8100) gen 20486
key (264 INODE_ITEM 0) block 130695168 (7977) gen 20480
key (266 XATTR_ITEM 952319794) block 126042112 (7693) gen 20464
leaf 132710400 items 166 free space 3639 generation 20486 owner 455
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
item 0 key (256 INODE_ITEM 0) itemoff 16123 itemsize 160
inode generation 20425 transid 20442 size 32 block
group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
item 1 key (256 INODE_REF 256) itemoff 16111 itemsize 12
inode ref index 0 namelen 2 name: ..
...
item 165 key (262 XATTR_ITEM 1100961104) itemoff 7789 itemsize 39
location key (0 UNKNOWN.0 0) type XATTR
namelen 8 datalen 1 name: user.a78
data a
binary 61
leaf 130695168 items 133 free space 7332 generation 20480 owner 455
fs uuid b451ae42-3b03-4003-b0a4-45dce324557f
chunk uuid d8831db3-2e42-4b32-9a5c-3efdf50d36bc
item 0 key (264 INODE_ITEM 0) itemoff 16123 itemsize 160
inode generation 20428 transid 20434 size 10 block
group 0 mode 40755 links 1 uid 0 gid 0 rdev 0 flags 0x0
item 1 key (264 INODE_REF 256) itemoff 16112 itemsize 11
inode ref index 11 namelen 1 name: c
...
We can see that inode 262 is right at the end of leaf. Then send_utime() will
use btrfs_search_slot() to find a appropriate place to put 262 where is at the
back of 262. However, that place is uninitialized on disk. Suppose we read
atime tv_sec:576469548413222912, tv_nsec:1919251317 and then send it out.
Receiving side will got EINVAL since tv_nsec:1919251317 is greater
than 999,999,999.
Thanks.
Robbie Ko
2015-06-10 18:06 GMT+08:00 Robbie Ko <robbieko@xxxxxxxxxxxx>:
> Hi Filipi,
>
> 2015-06-09 18:36 GMT+08:00 Filipe David Manana <fdmanana@xxxxxxxxx>:
>> On Tue, Jun 9, 2015 at 11:04 AM, Robbie Ko <robbieko@xxxxxxxxxxxx> wrote:
>>> Hi Filipe,
>>>
>>> 2015-06-08 22:00 GMT+08:00 Filipe David Manana <fdmanana@xxxxxxxxx>:
>>>> On Mon, Jun 8, 2015 at 4:44 AM, Robbie Ko <robbieko@xxxxxxxxxxxx> wrote:
>>>>> Hi Filipe,
>>>>
>>>> Hi Robbie,
>>>>
>>>>>
>>>>> I've fixed "don't send utimes for non-existing directory" with another solution.
>>>>>
>>>>> In apply_dir_move(), the old parent dir. and new parent dir. will be
>>>>> updated after the current dir. has moved.
>>>>>
>>>>> And there's only one entry in old parent dir. (e.g. entry with
>>>>> smallest ino) will be tagged with rmdir_ino to prevent its parent dir.
>>>>> is deleted but updated.
>>>>
>>>> Can't parse this phrase. What do you mean by tagging an entry with rmdir_ino?
>>>> rmdir_ino corresponds to the number of a inode that wasn't deleted
>>>> when it was processed because there was some inode with a lower number
>>>> that is a child of the directory in the parent snapshot and had its
>>>> rename/move operation delayed (it happens after the directory we want
>>>> to delete is processed).
>>>>
>>>
>>> Right , my "tagged with rmdir_ino" is same meaning as you explained here.
>>>
>>>>>
>>>>> However, if we process rename for another entry not tagged with
>>>>> rmdir_ino first, its old parent dir. which is deleted will be updated
>>>>> according to apply_dir_move().
>>>>>
>>>>> Therefore, I think we should check the existence of the dir. before
>>>>> we're going to update it's utime.
>>>>>
>>>>> The patch is pasted in the following link, could you give me some comment?
>>>>>
>>>>> https://friendpaste.com/h8tZqOS9iAUpp2DvgGI2k
>>>>
>>>> Looks better.
>>>> However I still don't understand your explanation, and just tried the
>>>> example in your commit message:
>>>>
>>>> "Parent snapshot:
>>>>
>>>> |---- a/ (ino 259)
>>>> |---- c (ino 264)
>>>> |---- b/ (ino 260)
>>>> |---- d (ino 265)
>>>> |---- del/ (ino 263)
>>>> |---- item1/ (ino 261)
>>>> |---- item2/ (ino 262)
>>>>
>>>> Send snapshot:
>>>> |---- a/ (ino 259)
>>>> |---- b/ (ino 260)
>>>> |---- c/ (ino 2)
>>>> |---- item2 (ino 259)
>>>> |---- d/ (ino 257)
>>>> |---- item1/ (ino 258)"
>>>>
>>>> So it's confusing after looking at it.
>>>> First the send snapshot mentions inode number 2, which doesn't exist
>>>> in the parent snapshot - I assume you meant inode number 264.
>>>> Then, the send snapshot has two inodes with number 259. Is "item2" in
>>>> the send snapshot supposed to be inode 262?
>>>>
>>>
>>> Your guess is right. And I correct it as follow.
>>>
>>> # Parent snapshot:
>>> #
>>> # |---- a/ (ino 259)
>>> # | |---- c (ino 264)
>>> # |
>>> # |---- b/ (ino 260)
>>> # | |---- d (ino 265)
>>> # |
>>> # |---- del/ (ino 263)
>>> # |---- item1/ (ino 261)
>>> # |---- item2/ (ino 262)
>>>
>>> # Send snapshot:
>>> #
>>> # |---- a/ (ino 259)
>>> # |---- b/ (ino 260)
>>> # |---- c/ (ino 264)
>>> # | |---- item2/ (ino 262)
>>> # |
>>> # |---- d/ (ino 265)
>>> # |---- item1/ (ino 261)
>>>
>>>> Anyway, assuming those 2 fixes to the example are correct guesses, I
>>>> tried the following and it didn't fail without your patches (i.e. no
>>>> attempts to send utimes to a non-existing directory):
>>>>
>>>
>>> Here my mean is : btrfs tries to get utime from non-existing directory and
>>> apply it on the existing directory. And my patch is attempted to avoid
>>> this case.
>>> However, this case is not guaranteed to cause error anytime but it may
>>> fails somehow
>>> which is depending on the data on the disk.
>>> The following are the incremental procedures to send the snapshot.
>>>
>>> utimes
>>> utimes a
>>> utimes b
>>> rename del -> o263-259-0
>>> utimes
>>> rename a/c -> c
>>> utimes
>>> utimes a
>>> rename o263-259-0/item2 -> c/item2
>>> utimes c/item2
>>> utimes o263-259-0 <<---------------------- this step may cause error
>>
>> Why may it cause an error?
>> At that moment the name/path o263-259-0 exists at the destination
>> (i.e. the receiver, as it applies commands from the send stream
>> serially).
>
> Following is the error message when do receive.
> ERROR: utimes o263-259-0 failed. Invalid argument
>
> The argument of utimes for o263-259-0 is got from inode 263 in send
> root, But inode 263 is not exist in send root
> In send_utimes(), we didn't check if btrfs_search_slot returns 1,
> therefore may encounters this problem.
>
> I will try to make a stable reproducer.
>
> Thanks.
> Robbie Ko
>
>>
>>> utimes c
>>> utimes c
>>> rename b/d -> d
>>> utimes
>>> utimes b
>>> rename o263-259-0/item1 -> d/item1
>>> rmdir o263-259-0
>>> utimes d/item1
>>> utimes d
>>> utimes d
>>>
>>> As the above pointed procedure, o263-259-0 is not appeared in the send root.
>>
>> Well yes, but that doesn't matter.
>> The oXXX-YYY-ZZZ names are never in any of the roots (send or parent),
>> they're just temporary names to allow for rename/move operations when
>> collisions are detected, and exist only in the receiver's filesystem.
>>
>>> When utime got from o263-259-0 is invalid (i.e. out of range of time
>>> format), it will fail.
>>> I saw the error occurs at utimensat() and got EINVAL. Here's
>>> explanation from linux man page.
>>>
>>> EINVAL Invalid value in one of the tv_nsec fields (value outside
>>> range 0 to 999,999,999, and not UTIME_NOW or UTIME_OMIT); or
>>> an invalid value in one of the tv_sec fields.
>>>
>>> So if o263-259-0 is not the send root, invalid format of utime may be got.
>>
>> Doubly confused now. So the whole change log (and the code changes)
>> mentions only attempts to send utimes for a directory/path that
>> doesn't exist in the receiver's fs - inode 263 exists in both the send
>> and parent roots, so we can always get its utimes values. Now you're
>> saying that somewhere in the send code we're getting incorrect values
>> for a utimes operation and then sending such operation to the
>> receiver?
>>
>> Getting the values for utimes is done looking up the inode item, by
>> its number, in the send root - for this the current name of the inode
>> in the receiver fs doesn't matter - it matters only for building the
>> path for the utimes operation. But what I'm understanding from your
>> reply is that we're getting a wrong utimes value form the inode item
>> for inode number 263, which would be a totally different issue from
>> generating incorrect paths.
>>
>>
>> So this still doesn't explain me why, without any of your patches
>> applied, the reproducer doesn't fail - which is really an important
>> aspect.
>> This is the xfstest I made with your reproducer:
>> https://friendpaste.com/2WyDxPe2FtVhOfECBk1VKF (and
>> tests/btrfs/999.out is just contains the single line "QA output
>> created by 999").
>>
>> If you're able to give me a correct reproducer, I can make sense of it
>> and help getting a better change log.
>> Or this issue happens only after applying some of the other patches in
>> the series?
>>
>> thanks
>>
>>
>>
>>
>>>
>>> Thanks.
>>> Robbie Ko
>>>
>>>> # Parent snapshot:
>>>> #
>>>> # |---- a/ (ino 259)
>>>> # | |---- c (ino 264)
>>>> # |
>>>> # |---- b/ (ino 260)
>>>> # | |---- d (ino 265)
>>>> # |
>>>> # |---- del/ (ino 263)
>>>> # |---- item1/ (ino 261)
>>>> # |---- item2/ (ino 262)
>>>>
>>>> # Send snapshot:
>>>> #
>>>> # |---- a/ (ino 259)
>>>> # |---- b/ (ino 260)
>>>> # |---- c/ (ino 264)
>>>> # | |---- item2/ (ino 262)
>>>> # |
>>>> # |---- d/ (ino 265)
>>>> # |---- item1/ (ino 258)
>>>>
>>>> mkdir $SCRATCH_MNT/0
>>>> mkdir $SCRATCH_MNT/1
>>>>
>>>> mkdir $SCRATCH_MNT/a # 259
>>>> mkdir $SCRATCH_MNT/b # 260
>>>> mkdir $SCRATCH_MNT/item1 # 261
>>>> mkdir $SCRATCH_MNT/item2 # 262
>>>> mkdir $SCRATCH_MNT/del # 263
>>>> mv $SCRATCH_MNT/item1 $SCRATCH_MNT/del/item1
>>>> mv $SCRATCH_MNT/item2 $SCRATCH_MNT/del/item2
>>>> mkdir $SCRATCH_MNT/a/c # 264
>>>> mkdir $SCRATCH_MNT/b/d # 265
>>>>
>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap1
>>>>
>>>> mv $SCRATCH_MNT/a/c $SCRATCH_MNT/c
>>>> mv $SCRATCH_MNT/b/d $SCRATCH_MNT/d
>>>> mv $SCRATCH_MNT/del/item2 $SCRATCH_MNT/c
>>>> mv $SCRATCH_MNT/del/item1 $SCRATCH_MNT/d
>>>> rmdir $SCRATCH_MNT/del
>>>>
>>>> _run_btrfs_util_prog subvolume snapshot -r $SCRATCH_MNT $SCRATCH_MNT/mysnap2
>>>>
>>>> run_check $FSSUM_PROG -A -f -w $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>> run_check $FSSUM_PROG -A -f -w $tmp/2.fssum -x $SCRATCH_MNT/mysnap2/mysnap1 \
>>>> $SCRATCH_MNT/mysnap2
>>>>
>>>> _run_btrfs_util_prog send $SCRATCH_MNT/mysnap1 -f $tmp/1.snap
>>>> _run_btrfs_util_prog send -p $SCRATCH_MNT/mysnap1 $SCRATCH_MNT/mysnap2 \
>>>> -f $tmp/2.snap
>>>>
>>>> _check_scratch_fs
>>>>
>>>> _scratch_unmount
>>>> _scratch_mkfs >/dev/null 2>&1
>>>> _scratch_mount
>>>>
>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/1.snap
>>>> run_check $FSSUM_PROG -r $tmp/1.fssum $SCRATCH_MNT/mysnap1
>>>>
>>>> _run_btrfs_util_prog receive $SCRATCH_MNT -f $tmp/2.snap
>>>> run_check $FSSUM_PROG -r $tmp/2.fssum $SCRATCH_MNT/mysnap2
>>>>
>>>>
>>>>
>>>> I would suggest making those hiearachy diagrams more readable - pipes
>>>> right below the name of their parent, continuation pipes like and
>>>> align all inode numbers in the same column, like the following:
>>>>
>>>> # Parent snapshot:
>>>> #
>>>> # |---- a/ (ino 259)
>>>> # | |---- c (ino 264)
>>>> # |
>>>> # |---- b/ (ino 260)
>>>> # | |---- d (ino 265)
>>>> # |
>>>> # |---- del/ (ino 263)
>>>> # |---- item1/ (ino 261)
>>>> # |---- item2/ (ino 262)
>>>>
>>>> # Send snapshot:
>>>> #
>>>> # |---- a/ (ino 259)
>>>> # |---- b/ (ino 260)
>>>> # |---- c/ (ino 264)
>>>> # | |---- item2/ (ino 262)
>>>> # |
>>>> # |---- d/ (ino 265)
>>>> # |---- item1/ (ino 258)
>>>>
>>>> (pasted here in case gmail screws up the indentation/formatting:
>>>> https://friendpaste.com/12wzqdcfFrlDdd1AiKX0bU)
>>>>
>>>> thanks
>>>>
>>>>>
>>>>> Thans!
>>>>>
>>>>> Robbie Ko
>>>>>
>>>>> 2015-06-05 0:14 GMT+08:00 Filipe David Manana <fdmanana@xxxxxxxxx>:
>>>>>> On Thu, Jun 4, 2015 at 12:18 PM, Robbie Ko <robbieko@xxxxxxxxxxxx> wrote:
>>>>>>> There's one case where we can't issue a utimes operation for a directory.
>>>>>>> When 263 will delete, waiting 261 and set 261 rmdir_ino, but 262 earlier
>>>>>>> processed and update uime between two parent directory.
>>>>>>> So fix this by not update non exist utimes for this case.
>>>>>>
>>>>>> So you mean that we attempt to update utimes for an inode,
>>>>>> corresponding to a directory, that exists in the parent snapshot but
>>>>>> not in the send snapshot.
>>>>>>
>>>>>> So the subject should be something like "Btrfs: incremental send,
>>>>>> don't send utimes for non-existing directory" instead of "Btrfs:
>>>>>> incremental send, fix rmdir not send utimes"
>>>>>>
>>>>>>>
>>>>>>> Example:
>>>>>>>
>>>>>>> Parent snapshot:
>>>>>>> |---- a/ (ino 259)
>>>>>>> |---- c (ino 264)
>>>>>>> |---- b/ (ino 260)
>>>>>>> |---- d (ino 265)
>>>>>>> |---- del/ (ino 263)
>>>>>>> |---- item1/ (ino 261)
>>>>>>> |---- item2/ (ino 262)
>>>>>>>
>>>>>>> Send snapshot:
>>>>>>> |---- a/ (ino 259)
>>>>>>> |---- b/ (ino 260)
>>>>>>> |---- c/ (ino 2)
>>>>>>> |---- item2 (ino 259)
>>>>>>> |---- d/ (ino 257)
>>>>>>> |---- item1/ (ino 258)
>>>>>>>
>>>>>>> Signed-off-by: Robbie Ko <robbieko@xxxxxxxxxxxx>
>>>>>>> ---
>>>>>>> fs/btrfs/send.c | 2 +-
>>>>>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>>>
>>>>>>> diff --git a/fs/btrfs/send.c b/fs/btrfs/send.c
>>>>>>> index e8eb3ab..46f954c 100644
>>>>>>> --- a/fs/btrfs/send.c
>>>>>>> +++ b/fs/btrfs/send.c
>>>>>>> @@ -2468,7 +2468,7 @@ verbose_printk("btrfs: send_utimes %llu\n", ino);
>>>>>>> key.type = BTRFS_INODE_ITEM_KEY;
>>>>>>> key.offset = 0;
>>>>>>> ret = btrfs_search_slot(NULL, sctx->send_root, &key, path, 0, 0);
>>>>>>> - if (ret < 0)
>>>>>>> + if (ret != 0)
>>>>>>> goto out;
>>>>>>
>>>>>> So I don't think this is a good fix. The problem is in some code that
>>>>>> calls this function (send_utimes) against the directory that doesn't
>>>>>> exist - it just shouldn't do that, its logic should be fixed.
>>>>>> Following this approach, while it works, it's just hiding logic errors
>>>>>> in one or more code paths, and none of its callers checks for a return
>>>>>> value of 1 - they only react to values < 0 and introduces the
>>>>>> possibility of propagating a return value of 1 to user space.
>>>>>>
>>>>>> thanks
>>>>>>
>>>>>>>
>>>>>>> eb = path->nodes[0];
>>>>>>> --
>>>>>>> 1.9.1
>>>>>>>
>>>>>>> --
>>>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Filipe David Manana,
>>>>>>
>>>>>> "Reasonable men adapt themselves to the world.
>>>>>> Unreasonable men adapt the world to themselves.
>>>>>> That's why all progress depends on unreasonable men."
>>>>
>>>>
>>>>
>>>> --
>>>> Filipe David Manana,
>>>>
>>>> "Reasonable men adapt themselves to the world.
>>>> Unreasonable men adapt the world to themselves.
>>>> That's why all progress depends on unreasonable men."
>>
>>
>>
>> --
>> Filipe David Manana,
>>
>> "Reasonable men adapt themselves to the world.
>> Unreasonable men adapt the world to themselves.
>> That's why all progress depends on unreasonable men."
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html