On 09/01/12 05:24, Digimer wrote:
With both of the bond's NICs down, the bond itself is going to drop.
Odds are, both NICs are plugged into the same switch.(assuming the OP isn't running things plugged nic-nic - which I have found in the past tends to be flakey when N-way negotiation becomes involved)
I'm assuming "heartbeat" - is a dedicated corosync (v)lan.To the OP: Please look at http://www.cyberciti.biz/howto/question/static/linux-ethernet-bonding-driver-howto.php and the descriptions of bonding there.
The type of bond you want for this purpose is either LACP (mode 3) (if NICs are plugged into a single switch or switch stack which supports LACP) or Active Failover (mode 1) if separate switches are involved.
Any other mode is potentially failure prone if things go wrong. FWIW: My heartbeat setup is as follows. 2 switches with a 4way LACP bond between them. 2 NICs on each cluster member in bonding mode 1, one NIC on each switch.This setup is resiliant against individual link (NIC, cable or fat fingers) OR switch failures.
Switches used for this purpose are best completely isolated from the rest of the network and multicast traffic control should be DISABLED.
Corosync can be set to failover to the public lan as a last resort but I've found it's not necessary - if things get bad enough that the private lan is completely out of action then the systems should shut themselves down (bad data is worse than zero data).
Switch ports should be set "portfast" or whatever the non-cisco equivalent is, or else ~30 seconds will be wasted in checking that whatever's attached doesn't have a lan segment behind it. This can also lead to fencing.
-- Linux-cluster mailing list Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster