Google
  Web www.spinics.net

joined and failed list seem to be wrong after a node failure in a large ring

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


Hi,
 
I have created a 32 node Corosync ring and did some node failure and recovery tests. I observed the following:
 
When I fail one or multiple nodes, corosync seems to report more node failures than the actual number of failed nodes, and therefore extra and wrong configuration callback functions are called. For example, when I reset just one node, I got the following logs which indicate there were 31 nodes left the ring:
 
Jun 07 15:37:26 corosync [TOTEM ] A processor failed, forming new configuration.
Jun 07 15:37:30 corosync [TOTEM ] A processor joined or left the membership and a new membership was formed.
Jun 07 15:37:30 corosync [CPG   ] chosen downlist: sender r(0) ip(169.254.0.1) ; members(old:32 left:31) --> it says 31 nodes left together,
 
This happens more often in a busier system and more often when reset more nodes at the same time. 
 
Note that we are using all default configurations suggested by Corosync.
 
Could this be a bug or a system configuration problem?
 
Thanks very much for the help.
 
Qiuping Li
 
_______________________________________________
discuss mailing list
discuss@xxxxxxxxxxxx
http://lists.corosync.org/mailman/listinfo/discuss

[Corosync Project]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Free Online Dating]     [Linux Kernel]     [Linux SCSI]     [XFree86]

Add to Google Powered by Linux