|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]|
On Thu, 07 Jul 2011 15:33:21 EDT, Vivek Goyal said: > On Wed, Jul 06, 2011 at 11:24:47AM +0200, Michael Holzheu wrote: > > S390 stand-alone dump tools are independent mini operating systems that > > are installed on disks or tapes. When a dump should be created, these > > stand-alone dump tools are booted. All that they do is to write the dump > > (current memory plus the CPU registers) to the disk/tape device. > > > > The advantage compared to kdump is that since they are freshly loaded > > into memory they can't be overwritten in memory. > > > Another advantage is > > that since it is different code, it is much less likely that the dump > > tool will run into the same problem than the previously crashed kernel. > > I think in practice this is not really a problem. If your kernel > is not stable enough to even boot and copy a file, then most likely > it has not even been deployed. The very fact that a kernel has been > up and running verifies that it is a stable kernel for that machine > and is capable of capturing the dump. Vivek: I used to do VM/XA on S/390 boxes for a living, and that's *not* where Michael is coming from. What the standalone dump code does is take a system that may have the moral equivalent of 256 separate PCI buses, several hundred disks all visible in multipath configurations, dozens of other devices, and as long as you can find *one* console and *one* tape/disk drive that works, you can capture a dump. More than once in my career, I got into a situation where the production system would hang - and booting off another disk that contained an older copy with maybe a few less patches would *also* hang. VM/XA would simply *not run*. Booting the standalone dump utility (which shared zero code with VM/XA, and did *much* less initialization of I/O devices not needed for the actual dump) would work just fine. This would get me a dump that would show that we had a (usually) hardware issue - either we were tripping over an errata that *no* released version of VM/XA had a workaround for, or outright defective hardware. For the same efficiency reasons that Linux doesn't do a lot of checking for "can never happen" cases, VM/XA doesn't check some things. So when busted hardware would present logically impossible combinations of status bits (for instance, "device still connected" but "I/O bus disconnected"), Bad Things would happen. Booting a tiny dump program that never even *tried* to look at the bad bits posted by the miscreant hardware would allow you to get the info you needed to debug it. *THAT* is the use case - when you have one customer out there in East Podunk who is consistently managing to hang their system so hard you can't get enough info out of it to figure out what's broken.
Description: PGP signature
[Kernel Newbies] [Share Photos] [IDE] [Security] [Git] [Netfilter] [Bugtraq] [Photo] [Yosemite] [Yosemite News] [MIPS Linux] [ARM Linux] [Linux Security] [Linux RAID] [Linux ATA RAID] [Samba] [Linux Media] [Device Mapper] [Linux Resources]