On Mon, 2014-03-17 at 23:27 -0700, Eric W. Biederman wrote: > Add a test skb_irq_freeable to report when it is safe to free a skb > from irq context. > > It is not safe to free an skb from irq context when: > - The skb has a destructor as some skb destructors call local_bh_disable > or spin_lock_bh. > - There is xfrm state as __xfrm_state_destroy calls spin_lock_bh. > - There is netfilter conntrack state as destroy_conntrack calls > spin_lock_bh. > - If there is a refcounted dst entry on the skb, as __dst_free > calls spin_lock_bh. > - If there is a frag_list, which could be a list of any skbs. > Otherwise it appears safe to free a skb from interrupt context. > > - Update the warning in skb_releae_head_state to warn about freeing > skb's in the wrong context. > > - Update __dev_kfree_skb_irq to free all skbs that it can immediately > > - Kill zap_completion_queue because there is no point going through > a queue of packets that are not safe to free and looking for packets > that are safe to free. > > Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > --- > include/linux/skbuff.h | 13 +++++++++++++ > net/core/dev.c | 14 +++++++++----- > net/core/netpoll.c | 32 -------------------------------- > net/core/skbuff.c | 13 ++++++++++--- > 4 files changed, 32 insertions(+), 40 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 03db95ab8a8c..53f72b53fd47 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -2833,6 +2833,19 @@ static inline void skb_init_secmark(struct sk_buff *skb) > { } > #endif > > +static inline bool skb_irq_freeable(struct sk_buff *skb) > +{ > + return !skb->destructor && > +#if IS_ENABLED(CONFIG_XFRM) > + !skb->sp && > +#endif > +#if IS_ENABLED(CONFIG_NF_CONNTRACK) > + !skb->nfct && > +#endif > + (!skb->_skb_refdst || (skb->_skb_refdst & SKB_DST_NOREF)) && > + !skb_has_frag_list(skb); > +} > + It would be a serious bug having (skb->_skb_refdst & SKB_DST_NOREF) at this point. dst would be RCU protected, but this can not be true as the packet was queued in TX ring buffer for a possibly long period. And even before reaching the driver, skb might have been queued in qdisc layer and escape rcu protection section anyway. Thats why we use skb_dst_force() from __dev_xmit_skb() -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html