Hi Filipe,
Thank you for your response. I realize it was a long time, ago, but we
are just now in the process of moving to stable kernel 4.14.x.
Regarding the fix, I see now the relevant code in "btrfs_remove_block_group":
mutex_lock(&fs_info->chunk_mutex);
if (!list_empty(&em->list)) {
/* We're in the transaction->pending_chunks list. */
free_extent_map(em);
}
...
However, this brings another doubt. Let's say we indeed performed
free_extent_map in the above code. But later we may do:
/*
* Our em might be in trans->transaction->pending_chunks which
* is protected by fs_info->chunk_mutex ([lock|unlock]_chunks),
* and so is the fs_info->pinned_chunks list.
*
* So at this point we must be holding the chunk_mutex to avoid
* any races with chunk allocation (more specifically at
* volumes.c:contains_pending_extent()), to ensure it always
* sees the em, either in the pending_chunks list or in the
* pinned_chunks list.
*/
list_move_tail(&em->list, &fs_info->pinned_chunks);
So we have dropped the ref that was held by
"transaction->pending_chunks" list, and now we moved the "em" to the
pinned_chunks without a ref. But the code assumes that "pinned_chunks"
also has a ref on the "em". For example in close_ctree, we do:
while (!list_empty(&fs_info->pinned_chunks)) {
struct extent_map *em;
em = list_first_entry(&fs_info->pinned_chunks,
struct extent_map, list);
list_del_init(&em->list);
free_extent_map(em);
}
Can you please comment on that?
Thanks,
Alex.
On Mon, Jan 21, 2019 at 10:06 PM Filipe Manana <fdmanana@xxxxxxxxxx> wrote:
>
> On Mon, Jan 21, 2019 at 7:07 PM Alex Lyakas <alex@xxxxxxxxxx> wrote:
> >
> > Hi Filipe,
> >
> > On Tue, Dec 2, 2014 at 8:08 PM Filipe Manana <fdmanana@xxxxxxxx> wrote:
> > >
> > > On block group remove if the corresponding extent map was on the
> > > transaction->pending_chunks list, we were deleting the extent map
> > > from that list, through remove_extent_mapping(), without any
> > > synchronization with chunk allocation (which iterates that list
> > > and adds new elements to it). Fix this by ensure that this is done
> > > while the chunk mutex is held, since that's the mutex that protects
> > > the list in the chunk allocation code path.
> > >
> > > This applies on top (depends on) of my previous patch titled:
> > > "Btrfs: fix race between fs trimming and block group remove/allocation"
> > >
> > > But the issue in fact was already present before that change, it only
> > > became easier to hit after Josef's 3.18 patch that added automatic
> > > removal of empty block groups.
> > >
> > > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > > ---
> > > fs/btrfs/extent-tree.c | 8 +++++++-
> > > 1 file changed, 7 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> > > index 17d429d..a7b81b4 100644
> > > --- a/fs/btrfs/extent-tree.c
> > > +++ b/fs/btrfs/extent-tree.c
> > > @@ -9524,19 +9524,25 @@ int btrfs_remove_block_group(struct btrfs_trans_handle *trans,
> > > list_move_tail(&em->list, &root->fs_info->pinned_chunks);
> > > }
> > > spin_unlock(&block_group->lock);
> > > - unlock_chunks(root);
> > >
> > > if (remove_em) {
> > > struct extent_map_tree *em_tree;
> > >
> > > em_tree = &root->fs_info->mapping_tree.map_tree;
> > > write_lock(&em_tree->lock);
> > > + /*
> > > + * The em might be in the pending_chunks list, so make sure the
> > > + * chunk mutex is locked, since remove_extent_mapping() will
> > > + * delete us from that list.
> > > + */
> > > remove_extent_mapping(em_tree, em);
> > > write_unlock(&em_tree->lock);
> > If the "em" was in pending_chunks, it will be deleted from that list
> > by "remove_extent_mapping". But it looks like in this case we also
> > need to drop the extra ref on "em", which was held by pending_chunks
> > list. I don't see it being done anywhere else. So we should check
> > before the remove_extent_mapping() call whether "em" was in
> > pending_chunks, and, if yes, drop the extra ref?
>
> This was part of a large patch set that fixed multiple issues with
> automatic removal of block groups.
> Dropping the extent map reference was done on another patch of that patch set:
>
> commit 495e64f4fe0363bc79fa0dfb41c271787e01b5c3
> Author: Filipe Manana <fdmanana@xxxxxxxx>
> Date: Tue Dec 2 18:07:30 2014 +0000
>
> Btrfs: fix fs mapping extent map leak
>
> Over 4 years ago....
>
> >
> > Thanks,
> > Alex.
> >
> >
> > > /* once for the tree */
> > > free_extent_map(em);
> > > }
> > >
> > > + unlock_chunks(root);
> > > +
> > > btrfs_put_block_group(block_group);
> > > btrfs_put_block_group(block_group);
> > >
> > > --
> > > 2.1.3
> > >
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html