|
|
|
Re: [PATCH 1/1] netfilter: fix soft lockup when netlink adds new entries | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] |
|
On Fri, Feb 24, 2012 at 11:45:49AM +0100, Jozsef Kadlecsik wrote:
> On Fri, 24 Feb 2012, Jozsef Kadlecsik wrote:
>
> > OK, and I remove nf_conntrack_hash_insert then, and use a single ct
> > argument because as you noted net and zone can be extracted from the ct.
>
> Here comes the second version of the patch, which assumes that the
> previous one is reverted.
>
> Best regards,
> Jozsef
>
> Marcell Zambo and Janos Farago noticed and reported that when
> new conntrack entries are added via netlink and the conntrack table
> gets full, soft lockup happens. This is because the nf_conntrack_lock
> is held while nf_conntrack_alloc is called, which is in turn wants
> to lock nf_conntrack_lock while evicting entries from the full table.
>
> The patch fixes the soft lockup with limiting the holding of the
> nf_conntrack_lock to the minimum, where it's absolutely required.
> It required to extend (and thus change) nf_conntrack_hash_insert
> so that it makes sure conntrack and ctnetlink do not add the same entry
> twice to the conntrack table.
>
> Signed-off-by: Jozsef Kadlecsik <kadlec@xxxxxxxxxxxxxxxxx>
> ---
> include/net/netfilter/nf_conntrack.h | 2 +-
> net/netfilter/nf_conntrack_core.c | 38 ++++++++++++++++++++--
> net/netfilter/nf_conntrack_netlink.c | 58 +++++++++++++--------------------
> 3 files changed, 58 insertions(+), 40 deletions(-)
>
> diff --git a/include/net/netfilter/nf_conntrack.h b/include/net/netfilter/nf_conntrack.h
> index 8a2b0ae..ab86036 100644
> --- a/include/net/netfilter/nf_conntrack.h
> +++ b/include/net/netfilter/nf_conntrack.h
> @@ -209,7 +209,7 @@ extern struct nf_conntrack_tuple_hash *
> __nf_conntrack_find(struct net *net, u16 zone,
> const struct nf_conntrack_tuple *tuple);
>
> -extern void nf_conntrack_hash_insert(struct nf_conn *ct);
> +extern int nf_conntrack_hash_check_insert(struct nf_conn *ct);
> extern void nf_ct_delete_from_lists(struct nf_conn *ct);
> extern void nf_ct_insert_dying_list(struct nf_conn *ct);
>
> diff --git a/net/netfilter/nf_conntrack_core.c b/net/netfilter/nf_conntrack_core.c
> index 76613f5..ed86a3b 100644
> --- a/net/netfilter/nf_conntrack_core.c
> +++ b/net/netfilter/nf_conntrack_core.c
> @@ -404,19 +404,49 @@ static void __nf_conntrack_hash_insert(struct nf_conn *ct,
> &net->ct.hash[repl_hash]);
> }
>
> -void nf_conntrack_hash_insert(struct nf_conn *ct)
> +int
> +nf_conntrack_hash_check_insert(struct nf_conn *ct)
> {
> struct net *net = nf_ct_net(ct);
> unsigned int hash, repl_hash;
> + struct nf_conntrack_tuple_hash *h;
> + struct hlist_nulls_node *n;
> u16 zone;
>
> zone = nf_ct_zone(ct);
> - hash = hash_conntrack(net, zone, &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
> - repl_hash = hash_conntrack(net, zone, &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
> + hash = hash_conntrack(net, zone,
> + &ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple);
> + repl_hash = hash_conntrack(net, zone,
> + &ct->tuplehash[IP_CT_DIR_REPLY].tuple);
> +
> + spin_lock_bh(&nf_conntrack_lock);
>
> + /* See if there's one in the list already, including reverse */
> + hlist_nulls_for_each_entry(h, n, &net->ct.hash[hash], hnnode)
> + if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_ORIGINAL].tuple,
> + &h->tuple) &&
> + zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
> + goto out;
> + hlist_nulls_for_each_entry(h, n, &net->ct.hash[repl_hash], hnnode)
> + if (nf_ct_tuple_equal(&ct->tuplehash[IP_CT_DIR_REPLY].tuple,
> + &h->tuple) &&
> + zone == nf_ct_zone(nf_ct_tuplehash_to_ctrack(h)))
> + goto out;
> +
> + add_timer(&ct->timeout);
> + nf_conntrack_get(&ct->ct_general);
> __nf_conntrack_hash_insert(ct, hash, repl_hash);
> + NF_CT_STAT_INC(net, insert);
> + spin_unlock_bh(&nf_conntrack_lock);
> +
> + return 0;
> +
> +out:
> + NF_CT_STAT_INC(net, insert_failed);
> + spin_unlock_bh(&nf_conntrack_lock);
> + return -EEXIST;
> }
> -EXPORT_SYMBOL_GPL(nf_conntrack_hash_insert);
> +EXPORT_SYMBOL_GPL(nf_conntrack_hash_check_insert);
>
> /* Confirm a connection given skb; places it in hash table */
> int
> diff --git a/net/netfilter/nf_conntrack_netlink.c b/net/netfilter/nf_conntrack_netlink.c
> index 9307b03..6d73501 100644
> --- a/net/netfilter/nf_conntrack_netlink.c
> +++ b/net/netfilter/nf_conntrack_netlink.c
> @@ -1345,7 +1345,7 @@ ctnetlink_create_conntrack(struct net *net, u16 zone,
> struct nf_conntrack_helper *helper;
> struct nf_conn_tstamp *tstamp;
>
> - ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_ATOMIC);
> + ct = nf_conntrack_alloc(net, zone, otuple, rtuple, GFP_KERNEL);
> if (IS_ERR(ct))
> return ERR_PTR(-ENOMEM);
>
Sorry, I was wrong. I didn't notice that this is called holding
rcu_read_lock() inside nfnetlink.c and we cannot use GFP_KERNEL
inside read-lock area.
I'm going to take the patch and will fix this myself.
Thanks Jozsef.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[Netfitler Users] [LARTC] [Bugtraq] [Yosemite Forum] [Photo]