I have discovered why other architectures do not have the same problem. The back trace indicates a real defect (i.e. deadlock) in the generic code.
Most architectures override sched_clock() with their own version. Kernel/sched_clock.c:38 is a weak alias that will be overridden if an arch directory supplies its own.
Most of the arch directories provide an implementation that directly access the jiffies_64 variable "without" acquiring the xtime_lock.
Some of the implementations provide a "poor" implementation in that the value of the jiffies_64 during a 32 rollover is not taken into account. If sched_clock() is to be called while holding xtime_lock, the core implementation should be modified not to call get_jiffies_64() (which requires the xlock) but to use something like the following:
unsigned long long sched_clock(void)
{
unsigned long long my_jiffies;
unsigned long jiffies_top;
unsigned long jiffies_bottom;
do {
jiffies_top = jiffies_64 >> 32;
jiffies_bottom = jiffies_64 & 0xffffffff;
} while(unlikely(jiffies_top != (unsigned long)(jiffies_64 >> 32)));
my_jiffies = ((unsigned long long)jiffies_top << 32) | (jiffies_bottom);
return (my_jiffies - INITIAL_JIFFIES) * (NSEC_PER_SEC / HZ);
}
-----Original Message-----
From: linux-newbie-owner@xxxxxxxxxxxxxxx [mailto:linux-newbie-owner@xxxxxxxxxxxxxxx] On Behalf Of Sol Kavy
Sent: Monday, April 13, 2009 3:01 PM
To: linux-newbie@xxxxxxxxxxxxxxx
Subject: IRQ Tracing Problem in Linux 2.6.28 Kernel
The following back trace represents a deadlock in Ubicom's SMP port of 2.6.28 kernel. I am sure that we are doing something unexpected. I would appreciate the community's help in understanding what is going wrong.
Thanks in advance for any pointers,
Sol Kavy
Problem:
Ubicom's initial port does not use GENERIC_CLOCKEVENTS. Instead it uses a periodic timer based on HZ. The periodic timer calls do_timer() on each tick.
>From the arch directory perspective, we are required to hold the xtime_lock before calling do_timer(). The lock is indeed help by cpu 3 as evidenced in the output below.
The call to get_jiffies_64() at the top of the backtrace is attempting to read the jiffies in a reliable fashion. The caller is required to wait for the xtime_lock not to be held. Clearly, since we are in a path that is holding the xtime_lock, this will never make forward progress.
What is unclear to me is why other ports are not seeing the same problem?
Perhaps it is because most ports now set GENERIC_CLOCKEVENTS which uses an entirely different mechanism for doing things. I am in the middle of switching the port to use GENERIC_CLOCKEVENTS but would like to understand this failure in more detail.
Any feedback is greatly appreciated,
Sol
Config Flags:
# CONFIG_GENERIC_CLOCKEVENTS is not set
CONFIG_GENERIC_HARDIRQS=y
CONFIG_IRQ_PER_CPU=y
CONFIG_TRACE_IRQFLAGS_SUPPORT=y
# CONFIG_DEBUG_SHIRQ is not set
CONFIG_TRACE_IRQFLAGS=y
CONFIG_IRQSOFF_TRACER=y
State of the lock:
(gdb) p xtime_lock
$5 = {sequence = 47089, lock = {raw_lock = {lock = 1}, magic = 3735899821,
owner_cpu = 3, owner = 0x42b01160}}
This is a backtrace from CPU 3:
(gdb) bt
#0 get_jiffies_64 () at include/linux/seqlock.h:94
#1 0x4044f558 in sched_clock () at kernel/sched_clock.c:40
#2 0x4045f108 in ring_buffer_time_stamp (cpu=3) at kernel/trace/ring_buffer.c:58
#3 0x40464c50 in ftrace_now (cpu=3) at kernel/trace/trace.c:77
#4 0x404656ec in trace_hardirqs_off () at kernel/trace/trace_irqsoff.c:207
#5 0x40413020 in _spin_lock_irqsave (lock=0x3) at kernel/spinlock.c:82
#6 0x40451b2c in clocksource_get_next () at kernel/time/clocksource.c:254
#7 0x3ffd08ac in update_wall_time () at kernel/time/timekeeping.c:182
#8 0x4043bcd8 in do_timer (ticks=0) at kernel/timer.c:1125
#9 0x404169f8 in timer_tick (irq=<value optimized out>,
dummy=<value optimized out>) at arch/ubicom32/kernel/time.c:126
#10 0x3ffcefb0 in handle_IRQ_event (irq=<value optimized out>,
action=0x406fe6f4) at kernel/irq/handle.c:142
#11 0x3ffcee20 in __do_IRQ (irq=<value optimized out>)
at kernel/irq/handle.c:239
#12 0x3ffcfcc8 in do_IRQ (irq=47089, regs=<value optimized out>)
at arch/ubicom32/kernel/irq.c:250
#13 0x4041018c in sys_call_table ()
#14 0x4044f558 in sched_clock () at kernel/sched_clock.c:40
#15 0x00000008 in ?? ()
--
To unsubscribe from this list: send the line "unsubscribe linux-newbie" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.linux-learn.org/faqs
--
To unsubscribe from this list: send the line "unsubscribe linux-newbie" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.linux-learn.org/faqs
[Site Home]
[Audio]
[Hams]
[Kernel Newbies]
[Memory]
[Security]
[Netfilter]
[Bugtraq]
[Photo]
[Yosemite Photos]
[Yosemite News]
[MIPS Linux]
[ARM Linux]
[Linux Security]
[Linux RAID]
[Samba]
[Video 4 Linux]
[Linux Resources]
[Fedora Users]