Re: Bisected KVM hang on x86-32 between v3.12 and v3.13

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Apr 06, 2014 at 05:19:27PM +0200, Michele Ballabio wrote:
> Toralf Förster reported this in
>   http://article.gmane.org/gmane.linux.kernel/1662567
>   http://article.gmane.org/gmane.linux.kernel/1658422
>   http://article.gmane.org/gmane.linux.kernel/1657962
> 
>   "The issue happens here at a 32 bit stable Gentoo Linux if
>    I try to start a KVM image. Kernels 3.12.X works fine,
>    kernel >= v3.13 will hang shortly after I started the image
>    with the virtual-manager. The last syslog messages are
>    something like:
>    Feb 28 16:22:00 n22 kernel: INFO: rcu_sched detected stalls
>        on CPUs/tasks: {} (detected by 2, t=60002 jiffies,
>        g=14689, c=14688, q=21051)
>    Feb 28 16:22:00 n22 kernel: INFO: Stall ended before state
>        dump start"
> 
> He correctly pointed out that the bisection blamed the merge
> commit 37bf06375c90a42fe07b9bebdb07bc316ae5a0ce
> "Merge tag 'v3.12-rc4' into sched/core".
> 
> This bug is obviously caused by at least two patches, one
> on each side of the merge, that only when combined together
> (at that merge point) cause the bug in kvm. By rebasing
> the "sched/core" branch on "master" before the merge and
> going on with the bisection, I found commit
> 3e8e42c69bb7d9fc12ebc23ff308e8523a2a59a0
> "sched: Revert need_resched() to look at TIF_NEED_RESCHED"
> as one of the causes. The other patch that contributes to the
> bug is commit ded797547548a5b8e7b92383a41e4c0e6b0ecb7f
> "irq: Force hardirq exit's softirq processing on its own stack".
> 
> Reverting either one of them solves the problem reported with kvm,
> but revert is probably not the correct answer.
> 
> I wonder if the solution is as simple as this:
> 
> --->8---
> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
> index 0af5250..f3b985d 100644
> --- a/arch/x86/Kconfig
> +++ b/arch/x86/Kconfig
> @@ -126,6 +126,7 @@ config X86
>  	select RTC_LIB
>  	select HAVE_DEBUG_STACKOVERFLOW
>  	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_64
> +	select HAVE_IRQ_EXIT_ON_IRQ_STACK if X86_32
>  	select HAVE_CC_STACKPROTECTOR

Ohh ahh.. shiney!

So what I suspect at this point is that because i386 and x86_64 have a
difference in current_thread_info() (i386 is stack based), we end up
setting the TIF_NEED_RESCHED bit on the wrong stack.

Now I have some vague memories of propagating the TIF flags on stack
switch, but I cannot remember what arch we did that for. Let me stare at
this a little more.

Also, IFF this is the case, then the fingered patch above (and your
suggested 'fix') aren't the real curlpit/cure but simply make it
more/less likely to happen.

Now, Steve had a patch somewhere that would make i386 use per-cpu
variables for current_thread_info() just like x86_64 already does I
think. Let me go find them too.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/




[Index of Archives]

  Powered by Linux

[Older Kernel Discussion]     [Yosemite National Park Forum]     [Large Format Photos]     [Gimp]     [Yosemite Photos]     [Stuff]     [Index of Other Archives]