[ogfs-users]RE: Problems with opengfs + opendlm on RHEL 3
Hi Ben,
> > Solved the problems I was having before by removing all existing
> > ultramonkey heartbeat rpms (including libnet) and compiling
> > from src. I
> > also needed to use LD_ASSUME_KERNEL=2.4 on top of all of that.
> Congratulations! And thanks for sharing. I"m going to cross-post to
> OpenDLM list to let that crew know about your experience.
Thanks. I just want to repeat (mostly for the archives) that I ended up using
a plain valilla 2.4.22 kernel as the final working setup with the rest of the install stock RHEL.
> > Problem 1: Mounting more than one ogfs filesystem with
> > opendlm does not
> > seem very stable at all, true?
> Yes, true ... there are a bunch of global/static variables for recovery
> in current CVS, and sharing between filesystems is not healthy right now
> ... I"m working on some changes that will "instancize" recovery stuff.
> Was hoping to check in today, but found something else that needs
> attention.
Great, thanks! Obviously your development on this is very appreciated. FWIW,
I was able to get memexp to work with multiple mounts by running multiple memexp
daemons on differing ports. I junked this implementation because I found the
setup to be not very stable. Killing the memexp daemons resulted in a corrupt filesystem.
But that's another story entirely :)
> > Not a biggie for me.. I can
> > run just one
> > big filesystem if necessary.
> For the moment, use just the one big one.
Right, no real big deal there.
> > Problem 2: Recovery is not very well documented.. how do you
> > do it!?!
> > I setup my 2 nodes with an opendlm mount. I then unmount one of the
> > nodes and stop the dlm and heartbeat. Any attempt to have that node
> > rejoin results in failure. the dlms seem to resume communications ok
> > after restarting things in order (and I even tried a full reboot too),
> > but the mount command just hangs. During all of this the mount on the
> > second node stays up which is good.. but oddly I can"t seem to unmount
> > that node cleanly?
> >
> > What is the correct procedure for recovering from this scenario while
> > maintaining high availability?
> >
> > Am I doing something wrong?
> No, the code just isn"t working right yet. Everything *should* be
> automatic (i.e. no docs required!). The recovery support in the opendlm
> lock module (OpenGFS component) is really new, and not very well tested.
Well, consider that a test :)
The upshot is that I could reboot the second node in the situation described above
and recreate a dual mount scenario. This is a little inconvienient but still ok for
my target environment since we can simply schedule a shutdown for the server(s) after
hours to resume the mounts.
I'm going to be working on a utility this week that will essentially allow the servers to
be booted simultaneously and mount the opengfs/dlm filesystem on both cleanly. My idea
is going to be to wait ~30 seconds as the script starts and use rsh to check that dlm is
running on the other node and ready to accept a mount. I can then put in a forced interactive
command if for some reason the filesystem cannot be mounted on one or the other systems allowing the
operator to make a decision on what to do (keep waiting/retry, STOMITH the other machine and mount
nolock, or skip the mount). Probably not useful for anyone else but it will work nicely for our 2
node setup. If time permits the nice thing to do would be a C-script that interfaces directly with
the DLM api to do status checking on the nodes rather than rsh.
> And, since RedHat recently released the Sistina GFS, I"m not sure how
> much more we"re going to be working on this project (OpenGFS), although
> it would be nice to get it into a clean state.
>
> Have you tried the RedHat GFS? See:
>
> http://sources.redhat.com/cluster
>
> http://sources.redhat.com/cluster/gfs/
>
> On the GFS page, there is a link to download source RPMs for GFS for
> RHEL 3.
Yes, I downloaded the source for GFS (actually we first tried the binary release
before it was open sourced) but we ended up looking at opengfs because they do not
yet support 2 node clusters (minimum is 3). I've read several pieces of documentation
that suggest they are working on a solution for this (seemingly popular!) cluster setup
so I will stay tuned until it is supported. Our hardware does not easily lend itself to
setting up a 3rd node (there are only 2 available connections to the shared storage).
I'll be staying tuned to this mailing list as well as redhat's page for updates in these
areas.
Thanks for you reply!
-Marc Swanson-
-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 -
digital self defense, top technical experts, no vendor pitches,
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Opengfs-users mailing list
Opengfs-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-users
[Site Home]
[Kernel list]
[Security]
[Bugtraq]
[Photo]
[Yosemite]
[MIPS Linux]
[ARM Linux]
[DVD Store]
[Linux Clusters]
[Linux RAID]
[Linux Resources]