On Fri, Aug 5, 2016 at 6:12 PM, Chris Mason <clm@xxxxxx> wrote:
>
> On 08/05/2016 07:08 AM, Nikolay Borisov wrote:
>> Hello,
>>
>> Any ideas how come btrfs_path can be all zero, the one in
>> the first slot comes from the increment in btrfs_next_old_item.
>
> Thanks for all the extra details. It really must be this:
>
> if (ret > 0) {
> btrfs_release_path(path);
> ret = btrfs_uuid_iter_rem(root, uuid, key.type,
> subid_cpu);
> if (ret == 0) {
> /*
> * this might look inefficient, but the
> * justification is that it is an
> * exception that check_func returns 1,
> * and that in the regular case only one
> * entry per UUID exists.
> */
> goto again_search_slot;
> }
> if (ret < 0 && ret != -ENOENT)
> goto out;
> }
> item_size -= sizeof(subid_le);
> offset += sizeof(subid_le);
>
>
> We've released the path, which would explain why its full of NULL. ret
> was ENOENT, so it kept on going, and we fell through to
> btrfs_next_item()
>
> Once the path is released, we should either be searching again or
> exiting. A goto again_search_slot would probably fix it, but I'd want
> to also bump the key so we don't just process the same item over and
> over again.
>
> Can you reproduce this reliably? I'd hate to patch it now and make more
> problems later just because we didn't fully understand the items we were
> tripping over.
Well there are 2 things I can do:
a) Dig more in the crash dump to see whether ret has been saved to
the stack and extract the return value. If your theory is correct I
should see the value of ENOENT.
b) Patch the code to print a warn when btrfs_uuid_iter_rem returns an
ENOENT, that way at least we will know that this is happening.
In either cases this would take me until at least next week, at which
time I should be able to give more information.
>
> -chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html