[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [ogfs-dev]Panic when deleting a freshly written file



I haven't seen any objections.

If you think that it's the right thing to do, go for it.

Thanks.

-- Ben --

Opinions are mine, not Intel's 

> -----Original Message-----
> From: opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx 
> [mailto:opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx] On Behalf 
> Of Steve Landherr
> Sent: Monday, April 19, 2004 5:25 PM
> To: opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: [ogfs-dev]Panic when deleting a freshly written file
> 
> I've run into a kernel NULL pointer dereference tickled by 
> the ogfs_glockd
> thread.  Here's the scenario:
> 
> Two node cluster sharing a fiberchannel attached LUN.
> Kernel: 2.4.22 + opengfs patches + EVMS patches
> Volume Manager: EVMS 2.2.2
> OpenGFS: CVS from 2003-09-28 (to get block allocation fix) + 
> plock.c fixes
> 
> A 4 GB filesystem is built with a 128MB external journal for 
> each node.  The
> filesystem is mounted on both nodes.
> 
> On one node, I create a file in the filesystem with the 
> following command:
> 
> # dd if=/dev/zero of=/ogfs/file bs=64k
> 
> I let it run for a while, then use Control-C to stop it.  I 
> then immediately
> delete the file from the same node.  The rm runs for a while 
> before the node
> panics:
> 
> Unable to handle kernel NULL pointer dereference at virtual 
> address 00000000
>  printing eip:
> c0133752
> *pde = 00000000
> Oops: 0000
> memexp pool stomith ogfs divdi3 lock_harness ide-cd cdrom 
> dm-mod autofs nfsd
> lo 
> CPU:    0
> EIP:    0010:[<c0133752>]    Not tainted
> EFLAGS: 00010213
> EIP is at do_buffer_fdatasync+0x32/0xb0 [kernel]
> eax: c0363580   ebx: 00000000   ecx: c03635bc   edx: c1024020
> esi: 00000000   edi: ce0e2944   ebp: c4567e68   esp: c4567e54
> ds: 0018   es: 0018   ss: 0018
> Process ogfs_glockd (pid: 2887, stackpage=c4567000)
> Stack: 00000001 00000000 00000000 ffffffff 00000000 c4567e8c c0133826
> ce0e2944 
>        00000000 ffffffff c0149a30 00000000 cf704880 d0aee000 00000002
> d098af91 
>        ce0e2880 00000000 ffffffff 00000001 c93d6780 d0aee514 c570d780
> 00000002 
> Call Trace:
>  [<c0133826>] generic_buffer_fdatasync+0x56/0x100 [kernel]
>  [<c0149a30>] writeout_one_page+0x0/0x70 [kernel]
>  [<d098af91>] ogfs_sync_pg+0x25/0xe8 [ogfs]
>  [<d0986a62>] sync_inode+0x52/0x64 [ogfs]
>  [<d0998f4b>] sync_dependencies+0x183/0x194 [ogfs]
>  [<d099ae29>] scan_held_glocks+0x149/0x238 [ogfs]
>  [<d099acb5>] ogfs_pitch_inodes+0x49/0x74 [ogfs]
>  [<d099af5a>] ogfs_glockd_scan+0x42/0x148 [ogfs]
>  [<c011ad62>] schedule_timeout+0x62/0xb0 [kernel]
>  [<d09820fd>] ogfs_glockd+0x9d/0x190 [ogfs]
>  [<d09820ee>] ogfs_glockd+0x8e/0x190 [ogfs]
>  [<c01077be>] arch_kernel_thread+0x2e/0x40 [kernel]
>  [<d0982060>] ogfs_glockd+0x0/0x190 [ogfs]
> 
> Code: 8b 1b 8b 46 28 85 c0 74 0d 8b 46 0c 3b 45 10 73 05 3b 45 0c 
>  
> Entering kdb (current=0xc4566000, pid 2887) on processor 0 Oops: Oops
> due to oops @ 0xc0133752
> eax = 0xc0363580 ebx = 0x00000000 ecx = 0xc03635bc edx = 0xc1024020 
> esi = 0x00000000 edi = 0xce0e2944 esp = 0xc4567e54 eip = 0xc0133752 
> ebp = 0xc4567e68 xss = 0xc0270018 xcs = 0x00000010 eflags = 
> 0x00010213 
> xds = 0xc1020018 xes = 0x00000018 origeax = 0xffffffff &regs 
> = 0xc4567e20
> [0]kdb> bt
> Stack traceback for pid 2887
> 0xc4566000     2887        1  1    0   R  0xc4566370 *ogfs_glockd
> EBP        EIP        Function (args)
> 0xc4567e68 0xc0133752 do_buffer_fdatasync+0x32 (0xce0e2944, 
> 0x0, 0xffffffff,
> 0x)
>                                kernel .text 0xc0100000 
> 0xc0133720 0xc01337d0
> 0xc4567e8c 0xc0133826 generic_buffer_fdatasync+0x56 (0xce0e2880, 0x0,
> 0xfffffff)
>                                kernel .text 0xc0100000 
> 0xc01337d0 0xc01338d0
>            0xd098af91 [ogfs]ogfs_sync_pg+0x25 (0xd0aee000, 
> 0xcf704880, 0x0,
> 0xd)
>                                ogfs .text 0xd0982060 
> 0xd098af6c 0xd098b054
>            0xd0986a62 [ogfs]sync_inode+0x52 (0xd0aee000, 
> 0xcf704880, 0x2,
> 0xd0a)
>                                ogfs .text 0xd0982060 
> 0xd0986a10 0xd0986a74
>            0xd0998f4b [ogfs]sync_dependencies+0x183 
> (0xd0aee000, 0xc93d6480,
> 0x)
>                                ogfs .text 0xd0982060 
> 0xd0998dc8 0xd0998f5c
>            0xd099ae29 [ogfs]scan_held_glocks+0x149 
> (0xd0aee000, 0x88, 0x12c,
> 0x)
>                                ogfs .text 0xd0982060 
> 0xd099ace0 0xd099af18
> 0xc4567fa4 0xd099af5a [ogfs]ogfs_glockd_scan+0x42 (0xd0aee000, 0x12c,
> 0xca175d7)
>                                ogfs .text 0xd0982060 
> 0xd099af18 0xd099b060
>            0xd09820fd [ogfs]ogfs_glockd+0x9d
>                                ogfs .text 0xd0982060 
> 0xd0982060 0xd09821f0
>            0xc01077be arch_kernel_thread+0x2e
>                                kernel .text 0xc0100000 
> 0xc0107790 0xc01077d0
> 
> Looking at the other running processes, I find that the rm is 
> racing the
> ogfs_glockd thread:
> 
> Stack traceback for pid 2891
> 0xc3efa000     2891     1606  0    0   R  0xc3efa370  rm
> EBP        EIP        Function (args)
> 0xc3efbe68 0xc011b099 schedule+0x2c9 (0xc1023a20, 0xd0afe624, 
> 0x1, 0x0, 0x0)
>                                kernel .text 0xc0100000 
> 0xc011add0 0xc011b330
> 0xc3efbea8 0xc01332fa truncate_list_pages+0xfa (0xce0e294c, 
> 0x0, 0xce0e2880,
> 0xd09ba3e0, 0x9b)
>                                kernel .text 0xc0100000 
> 0xc0133200 0xc0133430
> 0xc3efbec4 0xc01334fd truncate_inode_pages+0x5d (0xce0e2944, 0x0, 0x0,
> 0xce061380, 0xce0e2880)
>                                kernel .text 0xc0100000 
> 0xc01334a0 0xc0133530
> 0xc3efbee4 0xc015e6ef iput+0x10f (0xce0e2880, 0xc93d62b4, 0xd0aee000,
> 0xd0aee000)
>                                kernel .text 0xc0100000 
> 0xc015e5e0 0xc015e800
> 0xc3efbefc 0xc015c7f0 d_delete+0xa0 (0xce061380, 0x2, 0xc3efbf24,
> 0xcc2bc280, 0xcc2bc280)
>                                kernel .text 0xc0100000 
> 0xc015c750 0xc015c800
>            0xd0987c3c [ogfs]ogfs_unlink+0x110 (0xcc2bc280, 0xce061380,
> 0xc3efa000, 0xca990000)
>                                ogfs .text 0xd0982060 
> 0xd0987b2c 0xd0987c64
> 0xc3efbf7c 0xc0153c37 vfs_unlink+0xe7 (0xcc2bc280, 
> 0xce061380, 0xce061380,
> 0xc61d8500, 0xcff1)
>                                kernel .text 0xc0100000 
> 0xc0153b50 0xc0153d90
> 0xc3efbfbc 0xc0153e94 sys_unlink+0x104 (0xbffffb2a, 0x2, 
> 0xbffff980, 0x0,
> 0xbffffb2a)
>                                kernel .text 0xc0100000 
> 0xc0153d90 0xc0153ea0
>            0xc010960f system_call+0x33
>                                kernel .text 0xc0100000 
> 0xc01095dc 0xc0109614
> 
> The two racing functions are truncate_list_pages() and
> do_buffer_fdatasync().  Code inspection reveals a likely bug in
> do_buffer_fdatasync(), in that it doesn't ensure the mapping of a page
> hasn't changed while it was waiting for the page lock.  It 
> just blindly sets
> curr = page->list.next.
> 
> I checked the rest of the kernel source, and OpenGFS appears 
> to be the only
> caller of generic_buffer_fdatasync().  Other modules call
> filemap_fdatasync() instead, which doesn't have this bug.
> 
> Can anyone see a reason why ogfs_sync_pg() cannot be changed to use
> filemap_fdatasync()?  There are comments in the code that 
> indicate others
> were contemplating such a change.
> 
> -steve
> --
> Steve Landherr -- steve-sf@xxxxxxxxxxxxx
> San Francisco, California
> 
> 
> 
> 
> -------------------------------------------------------
> This SF.Net email is sponsored by: IBM Linux Tutorials
> Free Linux tutorial presented by Daniel Robbins, President and CEO of
> GenToo technologies. Learn everything from fundamentals to system
> administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
> _______________________________________________
> Opengfs-devel mailing list
> Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/opengfs-devel
> 


-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id70&alloc_id638&opÌk
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-devel


[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux