Re: automount/kernel crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Fri, 2012-04-20 at 21:38 +1000, Nick Piggin wrote:
> On Mon, Apr 16, 2012 at 10:24:54AM +0800, Ian Kent wrote:
> > On Sun, 2012-04-15 at 14:05 -0700, Jan Sanislo wrote:
> > > We are seeing occasional (approx. weekly) automount/kernel crashes using
> > > kernel version 3.1.7 and autofs version 5.0.5-39.  The log files show
> > > the following traceback:
> > 
> > Nick,
> > 
> > Can you have a look at my fs/autofs4/expire.c:get_next_positive_subdir()
> > function please.
> > 
> > It looks like my assignment of "p = q" in the "if (!simple_positive(q))
> > {}" block is incorrect. My thinking is that if q goes goes away while
> > waiting on the d_lock then it will have been removed from the child list
> > so I should just "goto again" with p as is. q itself will not actually
> > be freed until function exit since the autofs sbi->lookup_lock will
> > block in ->d_release(). Can you see any other problem with it and is
> > there a similar problem with
> > fs/autofs4/expire.c:get_next_positive_dentry()?
> 
> Hi Ian,
> 
> Firstly, what's the lock ordering on your d_lock of the dentries?
> Do you ensure that the vfs never locks two dentries at once, and
> you have your own lock order?

I think I just had the locking wrong.
It looks like I wasn't locking the parent (d_subdirs owner) so I've
changed that and we'll see how that goes.

> 
> Secondly, it seems like d_release won't be called until after the
> dentry has been removed from the d_child list. Couldn't that cause
> a corruption here?

Probably not, since that lock protects a different list but also
prevents dentrys from going away while I'm checking if they have already
gone away.

> 
> Thanks,
> Nick
> 
> 
> > 
> > > 
> > > ===========================
> > > 
> > > kernel: general protection fault: 0000 [#1] SMP 
> > > kernel: CPU 1 
> > > kernel: Modules linked in: binfmt_misc xt_tcpudp iptable_filter ip_tables ipt_ULOG x_tables nfsd dm_snapshot dm_mirror dm_region_hash dm_log sg bnx2 rng_core ipv6 ext4 jbd2 crc16 usbhid sd_mod sr_mod cdrom ata_piix libata megaraid_sas ehci_hcd uhci_hcd scsi_mod button radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core cfbcopyarea cfbimgblt cfbfillrect dm_mod [last unloaded: scsi_wait_scan]
> > > kernel: 
> > > kernel: Pid: 12716, comm: automount Not tainted 3.1.7-0cse.1 #6 Dell Inc. PowerEdge 2950/0CU542
> > > kernel: RIP: 0010:[<ffffffff8139c5b9>]  [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20
> > > kernel: RSP: 0018:ffff880009ef9d48  EFLAGS: 00010283
> > > kernel: RAX: 0000000000000100 RBX: ffff880424297240 RCX: dead0000001000cc
> > > kernel: RDX: ffff8803d7bdd840 RSI: ffff880421eb3d00 RDI: dead0000001000cc
> > > kernel: RBP: ffff880009ef9d48 R08: 0000000000000001 R09: 00007f516fbfad20
> > > kernel: R10: 0000000000000000 R11: 0000000000000246 R12: dead000000100070
> > > kernel: R13: ffff880414436480 R14: dead000000100100 R15: ffff8804242972a8
> > > kernel: FS:  00007f516fbfb700(0000) GS:ffff88043fc40000(0000) knlGS:0000000000000000
> > > kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > kernel: CR2: 00007f516fbfad30 CR3: 000000016ba69000 CR4: 00000000000006e0
> > > kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> > > kernel: Process automount (pid: 12716, threadinfo ffff880009ef8000, task ffff880071d14410)
> > > kernel: Stack:
> > > kernel: ffff880009ef9dd8 ffffffff811b72b3 ffff8802e407ca80 dead0000001000cc
> > > kernel: ffff8803d7bdd840 ffff880009ef9f28 00000000000124f8 0000000000000000
> > > kernel: ffff880421eb3d00 ffff8804144364dc ffff880414436520 ffff880424297240
> > > kernel: Call Trace:
> > > kernel: [<ffffffff811b72b3>] autofs4_expire_indirect+0xd3/0x440
> > > kernel: [<ffffffff811b78a5>] autofs4_do_expire_multi+0xc5/0x110
> > > kernel: [<ffffffff811b7c90>] ? autofs_dev_ioctl_askumount+0x30/0x30
> > > kernel: [<ffffffff811b7caa>] autofs_dev_ioctl_expire+0x1a/0x20
> > > kernel: [<ffffffff811b8253>] _autofs_dev_ioctl+0x273/0x360
> > > kernel: [<ffffffff810ee9f6>] ? __d_free+0x46/0x70
> > > kernel: [<ffffffff811b834e>] autofs_dev_ioctl+0xe/0x20
> > > kernel: [<ffffffff810eb166>] do_vfs_ioctl+0x96/0x550
> > > kernel: [<ffffffff810f6a7a>] ? mntput+0x1a/0x30
> > > kernel: [<ffffffff810dbc4f>] ? fput+0x16f/0x210
> > > kernel: [<ffffffff810eb66a>] sys_ioctl+0x4a/0x80
> > > kernel: [<ffffffff813a277b>] system_call_fastpath+0x16/0x1b
> > > kernel: Code: 00 75 05 f0 66 0f b1 17 0f 94 c2 0f b6 c2 85 c0 0f 95 c0 0f b6 c0 5d c3 66 2e 0f 1f 84 00 00 00 00 00 55 b8 00 01 00 00 48 89 e5 <f0> 66 0f c1 07 38 e0 74 06 f3 90 8a 07 eb f6 5d c3 66 0f 1f 44 
> > > kernel: RIP  [<ffffffff8139c5b9>] _raw_spin_lock+0x9/0x20
> > > kernel: RSP <ffff880009ef9d48>
> > > kernel: ---[ end trace e45ee0e39b72b82b ]---
> > > 
> > > ===========================
> > > 
> > > Note that the register dump contains numerous values like
> > > 	R14: dead000000100100
> > > 
> > > which seems to indicate some sort of list corruption/locking problem. The
> > > actual fault instruction seems to be from a call to _raw_spin_lock contained
> > > in the inline expansion of the fs/autofs4/expire.c[get_next_positive_subdir]
> > > call in the while loop of expire.c[autofs4_expire_indirect].
> > > 
> > > Is this a known problem?  Anybody else seeing these faults?
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe autofs" in
> > > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 


--
To unsubscribe from this list: send the line "unsubscribe autofs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux]     [Linux OMAP]     [Linux MIPS]     [ECOS]     [Tools]     [DDR & Rambus]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

Add to Google