RE: [ogfs-dev]Panic when deleting a freshly written file
I haven't seen any objections.
If you think that it's the right thing to do, go for it.
Thanks.
-- Ben --
Opinions are mine, not Intel's
> -----Original Message-----
> From: opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx
> [mailto:opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx] On Behalf
> Of Steve Landherr
> Sent: Monday, April 19, 2004 5:25 PM
> To: opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: [ogfs-dev]Panic when deleting a freshly written file
>
> I've run into a kernel NULL pointer dereference tickled by
> the ogfs_glockd
> thread. Here's the scenario:
>
> Two node cluster sharing a fiberchannel attached LUN.
> Kernel: 2.4.22 + opengfs patches + EVMS patches
> Volume Manager: EVMS 2.2.2
> OpenGFS: CVS from 2003-09-28 (to get block allocation fix) +
> plock.c fixes
>
> A 4 GB filesystem is built with a 128MB external journal for
> each node. The
> filesystem is mounted on both nodes.
>
> On one node, I create a file in the filesystem with the
> following command:
>
> # dd if=/dev/zero of=/ogfs/file bs=64k
>
> I let it run for a while, then use Control-C to stop it. I
> then immediately
> delete the file from the same node. The rm runs for a while
> before the node
> panics:
>
> Unable to handle kernel NULL pointer dereference at virtual
> address 00000000
> printing eip:
> c0133752
> *pde = 00000000
> Oops: 0000
> memexp pool stomith ogfs divdi3 lock_harness ide-cd cdrom
> dm-mod autofs nfsd
> lo
> CPU: 0
> EIP: 0010:[<c0133752>] Not tainted
> EFLAGS: 00010213
> EIP is at do_buffer_fdatasync+0x32/0xb0 [kernel]
> eax: c0363580 ebx: 00000000 ecx: c03635bc edx: c1024020
> esi: 00000000 edi: ce0e2944 ebp: c4567e68 esp: c4567e54
> ds: 0018 es: 0018 ss: 0018
> Process ogfs_glockd (pid: 2887, stackpage=c4567000)
> Stack: 00000001 00000000 00000000 ffffffff 00000000 c4567e8c c0133826
> ce0e2944
> 00000000 ffffffff c0149a30 00000000 cf704880 d0aee000 00000002
> d098af91
> ce0e2880 00000000 ffffffff 00000001 c93d6780 d0aee514 c570d780
> 00000002
> Call Trace:
> [<c0133826>] generic_buffer_fdatasync+0x56/0x100 [kernel]
> [<c0149a30>] writeout_one_page+0x0/0x70 [kernel]
> [<d098af91>] ogfs_sync_pg+0x25/0xe8 [ogfs]
> [<d0986a62>] sync_inode+0x52/0x64 [ogfs]
> [<d0998f4b>] sync_dependencies+0x183/0x194 [ogfs]
> [<d099ae29>] scan_held_glocks+0x149/0x238 [ogfs]
> [<d099acb5>] ogfs_pitch_inodes+0x49/0x74 [ogfs]
> [<d099af5a>] ogfs_glockd_scan+0x42/0x148 [ogfs]
> [<c011ad62>] schedule_timeout+0x62/0xb0 [kernel]
> [<d09820fd>] ogfs_glockd+0x9d/0x190 [ogfs]
> [<d09820ee>] ogfs_glockd+0x8e/0x190 [ogfs]
> [<c01077be>] arch_kernel_thread+0x2e/0x40 [kernel]
> [<d0982060>] ogfs_glockd+0x0/0x190 [ogfs]
>
> Code: 8b 1b 8b 46 28 85 c0 74 0d 8b 46 0c 3b 45 10 73 05 3b 45 0c
>
> Entering kdb (current=0xc4566000, pid 2887) on processor 0 Oops: Oops
> due to oops @ 0xc0133752
> eax = 0xc0363580 ebx = 0x00000000 ecx = 0xc03635bc edx = 0xc1024020
> esi = 0x00000000 edi = 0xce0e2944 esp = 0xc4567e54 eip = 0xc0133752
> ebp = 0xc4567e68 xss = 0xc0270018 xcs = 0x00000010 eflags =
> 0x00010213
> xds = 0xc1020018 xes = 0x00000018 origeax = 0xffffffff ®s
> = 0xc4567e20
> [0]kdb> bt
> Stack traceback for pid 2887
> 0xc4566000 2887 1 1 0 R 0xc4566370 *ogfs_glockd
> EBP EIP Function (args)
> 0xc4567e68 0xc0133752 do_buffer_fdatasync+0x32 (0xce0e2944,
> 0x0, 0xffffffff,
> 0x)
> kernel .text 0xc0100000
> 0xc0133720 0xc01337d0
> 0xc4567e8c 0xc0133826 generic_buffer_fdatasync+0x56 (0xce0e2880, 0x0,
> 0xfffffff)
> kernel .text 0xc0100000
> 0xc01337d0 0xc01338d0
> 0xd098af91 [ogfs]ogfs_sync_pg+0x25 (0xd0aee000,
> 0xcf704880, 0x0,
> 0xd)
> ogfs .text 0xd0982060
> 0xd098af6c 0xd098b054
> 0xd0986a62 [ogfs]sync_inode+0x52 (0xd0aee000,
> 0xcf704880, 0x2,
> 0xd0a)
> ogfs .text 0xd0982060
> 0xd0986a10 0xd0986a74
> 0xd0998f4b [ogfs]sync_dependencies+0x183
> (0xd0aee000, 0xc93d6480,
> 0x)
> ogfs .text 0xd0982060
> 0xd0998dc8 0xd0998f5c
> 0xd099ae29 [ogfs]scan_held_glocks+0x149
> (0xd0aee000, 0x88, 0x12c,
> 0x)
> ogfs .text 0xd0982060
> 0xd099ace0 0xd099af18
> 0xc4567fa4 0xd099af5a [ogfs]ogfs_glockd_scan+0x42 (0xd0aee000, 0x12c,
> 0xca175d7)
> ogfs .text 0xd0982060
> 0xd099af18 0xd099b060
> 0xd09820fd [ogfs]ogfs_glockd+0x9d
> ogfs .text 0xd0982060
> 0xd0982060 0xd09821f0
> 0xc01077be arch_kernel_thread+0x2e
> kernel .text 0xc0100000
> 0xc0107790 0xc01077d0
>
> Looking at the other running processes, I find that the rm is
> racing the
> ogfs_glockd thread:
>
> Stack traceback for pid 2891
> 0xc3efa000 2891 1606 0 0 R 0xc3efa370 rm
> EBP EIP Function (args)
> 0xc3efbe68 0xc011b099 schedule+0x2c9 (0xc1023a20, 0xd0afe624,
> 0x1, 0x0, 0x0)
> kernel .text 0xc0100000
> 0xc011add0 0xc011b330
> 0xc3efbea8 0xc01332fa truncate_list_pages+0xfa (0xce0e294c,
> 0x0, 0xce0e2880,
> 0xd09ba3e0, 0x9b)
> kernel .text 0xc0100000
> 0xc0133200 0xc0133430
> 0xc3efbec4 0xc01334fd truncate_inode_pages+0x5d (0xce0e2944, 0x0, 0x0,
> 0xce061380, 0xce0e2880)
> kernel .text 0xc0100000
> 0xc01334a0 0xc0133530
> 0xc3efbee4 0xc015e6ef iput+0x10f (0xce0e2880, 0xc93d62b4, 0xd0aee000,
> 0xd0aee000)
> kernel .text 0xc0100000
> 0xc015e5e0 0xc015e800
> 0xc3efbefc 0xc015c7f0 d_delete+0xa0 (0xce061380, 0x2, 0xc3efbf24,
> 0xcc2bc280, 0xcc2bc280)
> kernel .text 0xc0100000
> 0xc015c750 0xc015c800
> 0xd0987c3c [ogfs]ogfs_unlink+0x110 (0xcc2bc280, 0xce061380,
> 0xc3efa000, 0xca990000)
> ogfs .text 0xd0982060
> 0xd0987b2c 0xd0987c64
> 0xc3efbf7c 0xc0153c37 vfs_unlink+0xe7 (0xcc2bc280,
> 0xce061380, 0xce061380,
> 0xc61d8500, 0xcff1)
> kernel .text 0xc0100000
> 0xc0153b50 0xc0153d90
> 0xc3efbfbc 0xc0153e94 sys_unlink+0x104 (0xbffffb2a, 0x2,
> 0xbffff980, 0x0,
> 0xbffffb2a)
> kernel .text 0xc0100000
> 0xc0153d90 0xc0153ea0
> 0xc010960f system_call+0x33
> kernel .text 0xc0100000
> 0xc01095dc 0xc0109614
>
> The two racing functions are truncate_list_pages() and
> do_buffer_fdatasync(). Code inspection reveals a likely bug in
> do_buffer_fdatasync(), in that it doesn't ensure the mapping of a page
> hasn't changed while it was waiting for the page lock. It
> just blindly sets
> curr = page->list.next.
>
> I checked the rest of the kernel source, and OpenGFS appears
> to be the only
> caller of generic_buffer_fdatasync(). Other modules call
> filemap_fdatasync() instead, which doesn't have this bug.
>
> Can anyone see a reason why ogfs_sync_pg() cannot be changed to use
> filemap_fdatasync()? There are comments in the code that
> indicate others
> were contemplating such a change.
>
> -steve
> --
> Steve Landherr -- steve-sf@xxxxxxxxxxxxx
> San Francisco, California
>
>
>
>
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Opengfs-devel mailing list
> Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/opengfs-devel
>
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&opÌk
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-devel
[Kernel]
[Security]
[Bugtraq]
[Photo]
[Yosemite]
[MIPS Linux]
[ARM Linux]
[Linux Clusters]
[Linux RAID]
[Yosemite Hiking]
[Linux Resources]