[RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


 

> -----Original Message-----
> From: Eric W. Biederman [mailto:ebiederm at xmission.com] 
> Sent: Monday, June 26, 2006 11:38 AM
> To: Miller, Mike (OS Dev)
> Cc: vgoyal at in.ibm.com; Maneesh Soni; Andrew Morton; 
> Neela.Kolli at engenio.com; linux-scsi at vger.kernel.org; 
> fastboot at lists.osdl.org; linux-kernel at vger.kernel.org
> Subject: Re: [RFC] [PATCH 2/2] kdump: cciss driver 
> initialization issue fix
> 
> "Miller, Mike (OS Dev)" <Mike.Miller at hp.com> writes:
> 
> > All,
> > Sorry to come in late and top post. I've been out of the office and 
> > I'm trying to get to the gist of this issue.
> > Exactly what is the problem? I'm not familiar with kdump so I don't 
> > have a clue about what's going on.
> > There are a couple of reset features supported by _some_ cciss 
> > controllers. I'd have to go back to the open spec to see 
> whats in the 
> > public domain. We're trying to get the open spec updated and more 
> > complete but we're waiting on the lawyers. :(
> 
> 
> kdump or taking crash dumps using the kexec on panic 
> mechanism could be called a drivers worst nightmare.  In the 
> latest distros this is becoming the way crash dump style 
> information is captured.
> 
> Because the initial kernel is broken we do a jump into 
> another kernel that is sufficient to record a crash dump.  
> That second kernel initializes the hardware from whatever 
> random state the first kernel left the drivers in.  That 
> first kernel is not permitted to do any device shutdown activities.
> 
> The problem is that a command the running instance of the 
> driver did not initiate completes.  At least if I read Vivek 
> patch 2/2 correctly.
> 
> So we have three options.
> - reset the card during initialization.
> - handle the case of a command we did not initiate completing.
> - mark the driver/card as impossibly hopeless for use in a crash
>   dump scenario.
> 
> 
> Eric

Thanks Eric, that helps me understand. Section 8.2.2 of the open cciss
spec supports a reset message. Target 0x00 is the controller. We could
add this to the init routine to ensure the board is made sane again but
this would drastically increase init time under normal circumstances.
And I suspect this is a hard reset, also. Not sure if that would
negatively impact kdump. If there were some condition we could test
against and perform the reset when that condition is met it would not
impact 99.9% of users.

Thoughts, comments, flames?

mikem



[Netdev]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Photo]     [Yosemite]     [Yosemite Forum]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]     [Linux Media]     [Linux Resources]

Powered by Linux