Re: [PATCH 1/2] boot: ignore early NMIs

On 03/12/2012 02:49 PM, H. Peter Anvin wrote:

On 03/11/2012 10:43 PM, Fernando Luis Vázquez Cao wrote:
To tackle this issue we can either stop the hardlockup detector
or disable the LAPIC (the NMIs needed by x86's hardlockup detector
are generated using performance counters in the LAPIC), leaving
the I/O APICs untouched. The second is simpler and I think it
is the approach Don took to fix this issue in RHEL kernels.

Unfortunately, this is not enough, we are still exposed to external
NMIs not routed through the LAPIC. In other words, we have to make
sure that we always have and IDT that is able to handle NMIs without
seemingly random reboots and lockups. To achieve this goal we need
to fix machine_kexec() and the early IDT handlers. The current patch
set takes care of the latter.
The only source of NMIs other than the LAPIC should be the system error
which can be disabled through the RTC port, so I think your second
paragraph here is way more mechanism than you need for very little gain.

The thing is that we want to avoid playing with hardware in the kdump
reboot patch when we can avoid it, the premise being that it cannot
be accessed without risking a lockup or worse (as the deadlock accessing
the I/O APIC showed). The kernel is crashing after all. What is more,
I forgot to mention that the long term goal is to leave the LAPIC
untouched too (we really want to keep the number of things we do in the
context of the crashing kernel to the bare minimum), so we would still
need to fix the early IDT.

My patch set just installs a special handler for the NMI case so I think
it is pretty simple and self contained.

Another reason to apply these patches is to be consistent with the rest
of the kernel. Spurious NMIs that would have been ignored after installing
the final IDT would cause the system to halt if they happen
to arrive while the early IDT is in place.

Fernando

