On 2/14/07, Luke Browning <LukeBrowning at us.ibm.com> wrote: > > Hello, > > I have spent the last two weeks debugging kexec and kdump on the cell > platform. I got my kernels to boot on the Cell system and the Cell > simulator, but I am baffled as the changes that I made are in the common > powerpc code. If I am right, it doesn't work on powerpc. I must have > gone astray... Hi Luke, The good news is you're right, you did go astray :) > First, I am loading my kernel using > > kexec -l vmlinux or kexec -l vmlinux --append="maxcpus=0" That's OK, except for the maxcpus=0 part. I'd suggest you don't use maxcpus at all, it's not well tested, I'm pretty sure it doesn't work on the IBM cell blade for example. > This results in a bad start pointer. Specifically, > > image->start = image->memory[1].mem > > The entire kernel is loaded into image->memory[0].mem and I don't know what > is supposed to be in second and third memory segments, perhaps glue code, > but whatever it is it doesn't work. The system hangs executing code in the > second region. > > I changed the kernel, so that > > image->start = image->memory[0] + KERNELBASE. This is no good. You can't jump straight from the first kernel into the second, you have to go through purgatory. If you grab a copy of the kexec-tools source, (git://git.kernel.org/pub/scm/linux/kernel/git/horms/kexec-tools-testing.git), and look in purgatory/arch/ppc64/, you'll find v2wrap.S. v2wrap.S is the kexec boot wrapper (version 2) aka. purgatory. It starts with the following comments: # calling convention: # r3 = physical number of this cpu (all cpus) # r4 = address of this chunk (master only) # Invokes ppc64 kernel with the expected arguments # of kernel(device-tree, phys-offset, 0) Which is exactly what you discovered needs to be done :D Anyway, I'm glad someone's looking at kexec on cell, I haven't had the time to look closely at it. You said that your kexec was hanging in the second region, how were you debugging it? Can you give us anymore info? Another problem with the existing kexec code is it doesn't cope properly with 64k pages on non hypervisor machines, like the cell blade. We need to fix the FIXME in native_hpte_clear() (arch/powerpc/mm/hash_native_64.c). cheers