|-----Original Message-----
|From:
linux-cluster-bounces@xxxxxxxxxx [mailto:
linux-cluster-
|
bounces@xxxxxxxxxx] On Behalf Of Digimer
|Sent: Sunday, December 11, 2011 0:23 AM
|To: Matthew Painter
|Cc: linux clustering
|Subject: Re: Nodes leaving and re-joining intermittently
|
|On 12/10/2011 05:00 PM, Matthew Painter wrote:
|> The switch was our first thought, but that has been swapped, and while
|> we are not having nodes fenced anymore (we were daily), this anomoly
|> remains.
|>
|> I will ask for those logs and conf on Monday.
|>
|> I think it might be worth reinstalling corosync on this box anyway?
|> Can't be healthy if it is exiting unclearly. I have has reports of the
|> rgmanager dying on this box. (pid file but not running) Could that be
|> related?
|>
|> Thanks :)
|
|It's impossible to say without knowing your configuration. Please share the
|cluster.conf (only obfuscate passwords, please) along with the log files.
|The more detail, the better. Versions, distros, network config, etc.
|
|Uninstalling corosync is not likely help. RGManager is something fairly
|high up in the stack, so it's not likely the cause either.
|
|Did you configure the timeouts to be very high, by chance? I'm finding it
|difficult to fathom how the node can withdraw without being fenced, short
|of cleanly stopping the cluster stack. I suspect there is something
|important not being said, which the configuration information, versions and
|logs will hopefully expose.
|
|--
|Digimer
|E-Mail:
digimer@xxxxxxxxxxx
|Freenode handle: digimer
|Papers and Projects:
http://alteeve.com
|Node Assassin:
http://nodeassassin.org
|"omg my singularity battery is dead again.
|stupid hawking radiation." - epitron
|