CTDB daemon crashed on bringing down one node in the cluster

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



All,

I have a 3 node CTDB cluster which serves 4 'public addresses'.
/etc/ctdb/public_addresses file is node specific and present in
the above path in participating nodes. All the nodes run RHEL 6.2.

Other ctdb config files such as "nodes" and "public_addresses" are placed
on a shared filesystem mounted on a known location (say, /gluster/lock)

On starting CTDB service in all the nodes, we see things are fine via
ctdb status. All nodes are "OK" and connected.

To test the failover behaviour, I brought down one of the nodes.
"ctdb status" when run on one of the (up) nodes gave the following status,

[root@<nodename>~]# ctdb status
Number of nodes:4
pnn:0 x.y.z.a    DISCONNECTED|BANNED|UNHEALTHY|INACTIVE
pnn:1 x.y.z.b    BANNED|UNHEALTHY|INACTIVE (THIS NODE)
pnn:2 x.y.z.c    DISCONNECTED|UNHEALTHY|INACTIVE
pnn:3 x.y.z.d    OK
Generation:INVALID
Size:3
hash:0 lmaster:0
hash:1 lmaster:1
hash:2 lmaster:3
Recovery mode:RECOVERY (1)
Recovery master:3

In the above (edited) output, pnn: 2 is the one that was brought down.
I also observed that ctdb had crashed with signal 6 in pnn: 0. The stack trace
was not very useful. I am new to ctdb, I would like to know if there is anyway
I can get more useful stack traces on subsequent crashes (if any).

Is there something that I may have missed. Could somebody give me pointers how I can
debug this issue? 

cheers,
krish
-- 
To unsubscribe from this list go to the following URL and read the
instructions:  https://lists.samba.org/mailman/options/samba


[Index of Archives]     [Info Cyrus]     [LARTC]     [Bugtraq]     [Netfilter]     [RAID]     [Trinity TED Users]     [Yosemite News]
  Powered by Linux