[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Google
  Web www.spinics.net

[ogfs-users]RE: Problems with opengfs + opendlm on RHEL 3



Hi Ben,

> > Solved the problems I was having before by removing all existing
> > ultramonkey heartbeat rpms (including libnet) and compiling 
> > from src.  I
> > also needed to use LD_ASSUME_KERNEL=2.4 on top of all of that.
 
> Congratulations!  And thanks for sharing.  I"m going to cross-post to
> OpenDLM list to let that crew know about your experience.
 
Thanks.  I just want to repeat (mostly for the archives) that I ended up using 
a plain valilla 2.4.22 kernel as the final working setup with the rest of the install stock RHEL.

 
> > Problem 1:  Mounting more than one ogfs filesystem with 
> > opendlm does not
> > seem very stable at all, true?  
 
> Yes, true ... there are a bunch of global/static variables for recovery
> in current CVS, and sharing between filesystems is not healthy right now
> ... I"m working on some changes that will "instancize" recovery stuff.
> Was hoping to check in today, but found something else that needs
> attention.

Great, thanks!  Obviously your development on this is very appreciated.  FWIW, 
I was able to get memexp to work with multiple mounts by running multiple memexp 
daemons on differing ports.  I junked this implementation because I found the 
setup to be not very stable.  Killing the memexp daemons resulted in a corrupt filesystem.
But that's another story entirely :)

 
> > Not a biggie for me.. I can 
> > run just one
> > big filesystem if necessary.
 
> For the moment, use just the one big one.
 
Right, no real big deal there.

 
> > Problem 2:  Recovery is not very well documented.. how do you 
> > do it!?! 
> > I setup my 2 nodes with an opendlm mount.  I then unmount one of the
> > nodes and stop the dlm and heartbeat.  Any attempt to have that node
> > rejoin results in failure.  the dlms seem to resume communications ok
> > after restarting things in order (and I even tried a full reboot too),
> > but the mount command just hangs.  During all of this the mount on the
> > second node stays up which is good.. but oddly I can"t seem to unmount
> > that node cleanly?
> > 
> > What is the correct procedure for recovering from this scenario while
> > maintaining high availability?
> > 
> > Am I doing something wrong?
 
> No, the code just isn"t working right yet.  Everything *should* be
> automatic (i.e. no docs required!).  The recovery support in the opendlm
> lock module (OpenGFS component) is really new, and not very well tested.

Well, consider that a test :)
The upshot is that I could reboot the second node in the situation described above
and recreate a dual mount scenario.  This is a little inconvienient but still ok for
my target environment since we can simply schedule a shutdown for the server(s) after 
hours to resume the mounts.

I'm going to be working on a utility this week that will essentially allow the servers to
be booted simultaneously and mount the opengfs/dlm filesystem on both cleanly.  My idea
is going to be to wait ~30 seconds as the script starts and use rsh to check that dlm is 
running on the other node and ready to accept a mount.  I can then put in a forced interactive
command if for some reason the filesystem cannot be mounted on one or the other systems allowing the 
operator to make a decision on what to do (keep waiting/retry, STOMITH the other machine and mount 
nolock, or skip the mount).  Probably not useful for anyone else but it will work nicely for our 2 
node setup.  If time permits the nice thing to do would be a C-script that interfaces directly with
the DLM api to do status checking on the nodes rather than rsh.
 
> And, since RedHat recently released the Sistina GFS, I"m not sure how
> much more we"re going to be working on this project (OpenGFS), although
> it would be nice to get it into a clean state.
> 
> Have you tried the RedHat GFS?  See:
> 
> http://sources.redhat.com/cluster
> 
> http://sources.redhat.com/cluster/gfs/
> 
> On the GFS page, there is a link to download source RPMs for GFS for
> RHEL 3.

Yes, I downloaded the source for GFS (actually we first tried the binary release
before it was open sourced) but we ended up looking at opengfs because they do not
yet support 2 node clusters (minimum is 3).  I've read several pieces of documentation
that suggest they are working on a solution for this (seemingly popular!) cluster setup
so I will stay tuned until it is supported.   Our hardware does not easily lend itself to 
setting up a 3rd node (there are only 2 available connections to the shared storage).

I'll be staying tuned to this mailing list as well as redhat's page for updates in these
areas.

Thanks for you reply!

-Marc Swanson-
 




-------------------------------------------------------
This SF.Net email sponsored by Black Hat Briefings & Training.
Attend Black Hat Briefings & Training, Las Vegas July 24-29 - 
digital self defense, top technical experts, no vendor pitches, 
unmatched networking opportunities. Visit www.blackhat.com
_______________________________________________
Opengfs-users mailing list
Opengfs-users@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-users

[Site Home]     [Kernel list]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [DVD Store]     [Linux Clusters]     [Linux RAID]     [Linux Resources]

Powered by Linux