Re: Is this normal behaviour?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


I use the same configuration in production (SLES11HA) and experienced the same behaviour.

I found that setting the dependence value to "0" between the two clone groups changes the behavior to a more expected result (only unload services on the server affected and not it's partner).

This appears to be the designed behavior of Pacemaker as setting this value to "0" was outlined in the documentation in once of the examples.

I hope that helps.


On Apr 19, 2012 12:11 AM, <discuss-request@xxxxxxxxxxxx> wrote:
Send discuss mailing list submissions to

To subscribe or unsubscribe via the World Wide Web, visit
or, via email, send a message with subject or body 'help' to

You can reach the person managing the list at

When replying, please edit your Subject line so it is more specific
than "Re: Contents of discuss digest..."

Today's Topics:

  1. Is this normal behaviour? (Robert Telka)


Message: 1
Date: Wed, 18 Apr 2012 10:40:51 -0400
From: "Robert Telka" <Robert.Telka@xxxxxxxxxxxxxx>
To: discuss@xxxxxxxxxxxx
Subject: Is this normal behaviour?
Message-ID: <4F8E9A33.5647.009B.0@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Seeing some odd behaviour - is this normal???

Goal is to create an active-active environment (from 2 to many serves) for a webfarm with a clustered filesystem (ocfs2).  User-space ocfs2 (ie, via corosync) is only supported at SLES 11 HAE hence the need for the corosync middle man.  (As an aside, kernel-based ocfs2 will continue to work with SLES 11 HAE, but is only supported in an Oracle RAC configuration)

Cluster config:
Based on SLES 11 HAE SP2
Created a cloned "base" group consisting of dlm and o2cb resources (both required for ocfs2 filesystems)
Configured a stonith_sbd resource
Created individual ocfs2 filesystem resources, cloned.  Idea is that individual filesystems can be brought down across the cluster for maintenance.  Each filesystem clone has a startup dependency on the "base" clone group.

Two nodes in the cluster (ignoring quorum).  Haven't yet tested with three or more with/without quorum.

Imagine this scenario:
Server A and B are running; all cloned resources are running on both nodes (dlm, o2cb, and ocfs2 filesystems mounted)
Server A requires downtime for maintenance (eg, add memory, replace failed component, etc)
Server A is placed into standby mode, and all resources on that node are automatically stopped.  Quorum is ignored as any applications running on Server B should continue to run in the event that Server A is powered off.
When work is complete, Server A is brought back online (from standby)

The problem:
During the transition of Server A from standby to online, Corosync/pacemaker stops ALL cloned resources on Server B, and then starts all resources on Server A and B.

With filesystem I/O occuring on Server B, the filesystems are abruptly unmounted and all I/O is terminated.  Not good, since any inflight transactions are lost with potential filesystem/data corruption.

Is this really the desired behaviour???  Shouldn't the resources be started on Server A WITHOUT impacting the resources running on other servers???

Is this a "group", "clone", or "clone group" behaviour?

Thanks to all for helping shed some light.  I really hope this isn't a feature ;-)

Robert Telka

Royal Canadian Mounted Police
1200 Vanier Parkway
CPIC 2-108
Ottawa, Ontario, K1A 0R2
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: image/bmp
Size: 74454 bytes
Desc: not available
URL: <>


discuss mailing list

End of discuss Digest, Vol 8, Issue 21
discuss mailing list

[Corosync Project]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]    [Yosemite Photos]    [Free Online Dating]     [Linux Kernel]     [Linux SCSI]     [XFree86]

Add to Google Powered by Linux