Re: net-next: NULL pointer dereference on adding a net namespace and a system freeze

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, 11 Mar 2014 13:00:59 +0100, Steffen Klassert wrote:
> On Tue, Mar 11, 2014 at 01:46:49AM +0100, Jakub Kiciński wrote:
> > 
> > I bisected the other issue to be caused/uncovered by:
> > 
> > commit 1a1ccc96abb2ed9b8fbb71018e64b97324caef53
> > Author: Steffen Klassert <steffen.klassert@xxxxxxxxxxx>
> > Date:   Wed Feb 19 10:07:34 2014 +0100
> > 
> >     xfrm: Remove caching of xfrm_policy_sk_bundles
> >     
> >     We currently cache socket policy bundles at xfrm_policy_sk_bundles.
> >     These cached bundles are never used. Instead we create and cache
> >     a new one whenever xfrm_lookup() is called on a socket policy.
> >     
> >     Most protocols cache the used routes to the socket, so let's
> >     remove the unused caching of socket policy bundles in xfrm.
> >     
> >     Signed-off-by: Steffen Klassert <steffen.klassert@xxxxxxxxxxx>
> > 
> 
> This patch should affect only on the usage of IPsec socket policies.
> Do you use socket policies, or do you use IPsec at all?

I'm running pretty standard Fedora 20 installation here (notably with
NetowrkManager removed).  Two daemons that trigger flow_cache warnings
are libvirt and rtkit. 

I'm not sure how to check IPsec policies, ip xfrm state/policy don't
show anything.

> > 
> > Machine freezes after FLOW_HASH_RND_PERIOD (default 10 minutes).
> > Now get this warning during boot:
> > 
> > [   31.664820] ------------[ cut here ]------------
> > [   31.664824] WARNING: CPU: 2 PID: 3560 at /home/kuba/Development/Linux/net-next/lib/list_debug.c:33 __list_add+0xac/0xc0()
> > [   31.664826] list_add corruption. prev->next should be next (ffff880224579598), but was           (null). (prev=ffff8802106140e8).
> > [   31.664827] Modules linked in: xt_CHECKSUM tun bridge stp llc ccm xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw ftdi_sio arc4 rt2800pci rt2800mmio rt2800lib crc_ccitt eeprom_93cx6 rt2x00pci kvm_amd rt2x00mmio rt2x00lib mac80211 kvm snd_ca0106 cfg80211 e1000e snd_ac97_codec ac97_bus microcode serio_raw ptp i2c_piix4 k10temp acpi_cpufreq pps_core wmi r8169 mii rfkill nfsd auth_rpcgss nfs_acl lockd binfmt_misc sunrpc usb_storage radeon drm_kms_helper ttm
> > [   31.664855] CPU: 2 PID: 3560 Comm: (t-daemon) Not tainted 3.14.0-rc2-1a1ccc96abb2ed9b8fbb71018e64b97324caef53+ #11
> > [   31.664856] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790XT-UD4P/GA-MA790XT-UD4P, BIOS F9b 08/17/2012
> > [   31.664857]  0000000000000009 ffff8802242e7c70 ffffffff81627878 ffff8802242e7cb8
> > [   31.664859]  ffff8802242e7ca8 ffffffff8104a28d ffff880210610ea8 ffff880224579598
> > [   31.664861]  ffff8802106140e8 ffff880224578000 0000000000000000 ffff8802242e7d08
> > [   31.664863] Call Trace:
> > [   31.664865]  [<ffffffff81627878>] dump_stack+0x4d/0x66
> > [   31.664867]  [<ffffffff8104a28d>] warn_slowpath_common+0x7d/0xa0
> > [   31.664869]  [<ffffffff8104a2fc>] warn_slowpath_fmt+0x4c/0x50
> > [   31.664871]  [<ffffffff812fdd8c>] __list_add+0xac/0xc0
> > [   31.664873]  [<ffffffff81055d33>] __internal_add_timer+0x113/0x130
> > [   31.664875]  [<ffffffff81055f47>] internal_add_timer+0x17/0x40
> > [   31.664876]  [<ffffffff810587b2>] mod_timer+0x102/0x230
> > [   31.664878]  [<ffffffff810588f8>] add_timer+0x18/0x20
> > [   31.664880]  [<ffffffff81572204>] flow_cache_init+0x224/0x2b0
> > [   31.664882]  [<ffffffff815f7247>] xfrm_net_init+0x227/0x360
> > [   31.664884]  [<ffffffff815f7171>] ? xfrm_net_init+0x151/0x360
> > [   31.664886]  [<ffffffff81553131>] ops_init+0x41/0x150
> > [   31.664888]  [<ffffffff815532b3>] setup_net+0x73/0x110
> > [   31.664890]  [<ffffffff815537f2>] copy_net_ns+0x72/0x100
> > [   31.664892]  [<ffffffff81072619>] create_new_namespaces+0xf9/0x190
> > [   31.664894]  [<ffffffff81072891>] unshare_nsproxy_namespaces+0x61/0xa0
> > [   31.664895]  [<ffffffff81049949>] SyS_unshare+0x159/0x270
> > [   31.664897]  [<ffffffff81638092>] system_call_fastpath+0x16/0x1b
> > 
> 
> I was unable to reproduce this here, but it looks like the flowcache
> namespace changes are still not complete. We leak an active timer
> and all the allocated resources when we exit a namespace.

I also failed to reproduce it reliably on a VM. On a VM it happens 50%
of the times while on physical machine it's triggered reliably on every
boot.

While playing restarting libvirt and rtkit to see it they produce any
xfrm noise I got this:

[  292.624771] BUG: soft lockup - CPU#1 stuck for 22s! [(t-daemon):4655]
[  292.624777] Modules linked in: bnep bluetooth 6lowpan_iphc fuse ipt_MASQUERADE xt_CHECKSUM tun bridge stp llc ccm xt_conntrack iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_security iptable_raw arc4 rt2800pci rt2800mmio rt2800lib crc_ccitt eeprom_93cx6 rt2x00pci rt2x00mmio rt2x00lib ftdi_sio kvm_amd mac80211 cfg80211 kvm e1000e snd_ca0106 snd_ac97_codec i2c_piix4 rfkill microcode ac97_bus serio_raw k10temp r8169 mii acpi_cpufreq ptp wmi pps_core nfsd auth_rpcgss nfs_acl lockd binfmt_misc sunrpc usb_storage radeon drm_kms_helper ttm
[  292.624884] CPU: 1 PID: 4655 Comm: (t-daemon) Not tainted 3.14.0-rc2d3623099d3509fa68fa28235366049dd3156c63a+ #10
[  292.624889] Hardware name: Gigabyte Technology Co., Ltd. GA-MA790XT-UD4P/GA-MA790XT-UD4P, BIOS F9b 08/17/2012
[  292.624894] task: ffff8802228753c0 ti: ffff8800b515a000 task.ti: ffff8800b515a000
[  292.624899] RIP: 0010:[<ffffffff81072a63>]  [<ffffffff81072a63>] raw_notifier_chain_register+0x23/0x40
[  292.624910] RSP: 0018:ffff8800b515bd98  EFLAGS: 00000246
[  292.624914] RAX: ffff8802014d0ec0 RBX: ffffffff81c23340 RCX: 0000000000000004
[  292.624919] RDX: 0000000000000000 RSI: ffff8800b50f1fc0 RDI: ffff8802014d0ec8
[  292.624923] RBP: ffff8800b515bd98 R08: 0000000000000000 R09: 0000000000000000
[  292.624928] R10: 0000000000000000 R11: 0000000000000000 R12: ffffffff81c233a8
[  292.624933] R13: 0000000180040004 R14: 0000000000000246 R15: 000060fd00000000
[  292.624939] FS:  00007fa39d6118c0(0000) GS:ffff88022fc80000(0000) knlGS:00000000e26ffb40
[  292.624944] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  292.624948] CR2: 00007fa4b45a7f40 CR3: 00000000bd2e6000 CR4: 00000000000007e0
[  292.624951] Stack:
[  292.624955]  ffff8800b515bdb0 ffffffff8161ff8a ffff8800b50f1100 ffff8800b515bde0
[  292.624965]  ffffffff815721be ffff8800b50f1100 0000000000000000 ffff8800b50f1160
[  292.624974]  ffff8800b50f1290 ffff8800b515be28 ffffffff815f7321 ffffffff815f7231
[  292.624982] Call Trace:
[  292.624992]  [<ffffffff8161ff8a>] register_cpu_notifier+0x2a/0x40
[  292.625001]  [<ffffffff815721be>] flow_cache_init+0x1de/0x2b0
[  292.625009]  [<ffffffff815f7321>] xfrm_net_init+0x241/0x380
[  292.625016]  [<ffffffff815f7231>] ? xfrm_net_init+0x151/0x380
[  292.625025]  [<ffffffff81553131>] ops_init+0x41/0x150
[  292.625033]  [<ffffffff815532b3>] setup_net+0x73/0x110
[  292.625042]  [<ffffffff815537f2>] copy_net_ns+0x72/0x100
[  292.625050]  [<ffffffff81072619>] create_new_namespaces+0xf9/0x190
[  292.625058]  [<ffffffff81072891>] unshare_nsproxy_namespaces+0x61/0xa0
[  292.625065]  [<ffffffff81049949>] SyS_unshare+0x159/0x270
[  292.625073]  [<ffffffff816381d2>] system_call_fastpath+0x16/0x1b
[  292.625077] Code: e9 7b ff ff ff 0f 1f 00 66 66 66 66 90 55 48 8b 07 48 89 e5 48 85 c0 74 21 8b 56 10 3b 50 10 7e 0c eb 17 0f 1f 44 00 00 39 50 10 <7c> 0d 48 8d 78 08 48 8b 40 08 48 85 c0 75 ee 48 89 46 08 31 c0


This is net-next with head at d3623099d3509fa68fa28235366049dd3156c63a

It takes a few restarts of libvirt/rtkit-daemon to trigger, but I've
definitely seen register_cpu_notifier appearing in backtraces before...
maybe this is some kind of a lead?

> Could you please try the patch below?

Testing now... Expect results in 15 minutes...

> Also, please send your config if the patch does not fix your problem.

config: http://paste.fedoraproject.org/84281/54146313

	-- kuba
--
To unsubscribe from this list: send the line "unsubscribe netdev" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Kernel Discussion]     [TCP Instrumentation]     [Ethernet Bridging]     [Linux Wireless Networking]     [Linux WPAN Networking]     [Linux Host AP]     [Linux WPAN Networking]     [Linux Bluetooth Networking]     [Linux ATH6KL Networking]     [Linux Networking Users]     [Linux Coverity]     [VLAN]     [Git]     [IETF Annouce]     [Linux Assembly]     [Security]     [Bugtraq]     [Yosemite Information]     [MIPS Linux]     [ARM Linux Kernel]     [ARM Linux]     [Linux Virtualization]     [Linux IDE]     [Linux RAID]     [Linux SCSI]