On Thu, Dec 12, 2019 at 08:39:43AM +0800, Qu Wenruo wrote:
>
>
> On 2019/12/11 下午11:34, David Sterba wrote:
> > On Wed, Dec 11, 2019 at 01:00:01PM +0800, Qu Wenruo wrote:
> >> Due to commit d2311e698578 ("btrfs: relocation: Delay reloc tree
> >> deletion after merge_reloc_roots"), reloc tree lifespan is extended.
> >>
> >> Although we always set root->reloc_root to NULL before we drop the reloc
> >> tree, but that's not multi-core safe since we have no proper memory
> >> barrier to ensure other cores can see the same root->reloc_root.
> >>
> >> The proper root fix should be some proper root refcount, and make
> >> btrfs_drop_snapshot() to wait for all other root owner to release the
> >> root before dropping it.
> >
> > This would block cleaning deleted subvolumes, no? We can skip the dead
> > tree (and add it back to the list) in that can and not wait. The
> > cleaner thread is able to process the list repeatedly.
>
> What I mean is:
> - For consumer (reading root->reloc_root)
> spin_lock(&root->reloc_lock);
> if (!root->reloc_root) {
> spin_unlock(&root->reloc_lock);
> return NULL
> }
> refcount_inc(&root->reloc_root->refcount);
> return(root->reloc_root);
> spin_unlock(&root->reloc_lock);
>
> And of cource, release it after grabbing reloc_root.
>
> - For cleaner
> grab reloc_root just like consumer.
> retry:
> wait_event(refcount_read(&root->reloc_root->ref_count) == 1);
> spin_lock(&root->reloc_lock);
> if (&root->reloc_root->ref_count != 1){
> spin_unlock(); goto retry;
> }
> root->reloc_root = NULL;
> spin_unlock(&root->reloc_lock);
> /* Now we're the only owner, delete the root */
The spinlock should be safe as well, do you mean to take it to verify
that reloc_root is valid everywhere?
> > Clearing of the bit is done when there are not potential other users so
> > that part does not need the barrier (I think).
> >
> > The checking part could use a helper so we don't have barriers scattered
> > around code.
> >
> I'm still not confident enough for the "reloc_root = NULL" assignment
> and "reloc_root == NULL" test.
>
> But since the set_bit()/test_bit() is safe, and it happens before we
> modify reloc_root, it's safer and is what we used in this quick fix.
>
> Still, I'm really looking forward to Josef's root refcount work, that
> should be the real fix for all the problems.
That's a huge series and unsuitable for backports to stable, we need
something like your patches first.