RE: [ogfs-dev]Recovery Race conditions
On Fri, 2003-08-01 at 04:10, Stanley Wang wrote:
> On Fri, 2003-08-01 at 13:11, Cahill, Ben M wrote:
> [snip]
> > >
> > > 1) Single node failure:
> > >
> > > 1.a) Journal Replay by multiple nodes may occur
> > > Status: ogfs must address
> > > Current Solution: ???
> > > Potential Solution: Deadman Locks (See below)
> > > Potential Problems: What would this do "no-lock" mode?
> > > Do we still want to support "no-lock" mode?
> >
> > For current solution, see my comments under 1.f. Only one node will perform journal replay at one time. It is most efficient if one node can be selected to do the replay (memexp does this successfully), rather than asking *all* active nodes to attempt the same replay. But I don't think it would hurt anything, except performance, if all nodes actually did a replay (one at a time, see 1.f). The first might take some time, but the rest would be be pretty quick, since the journal should then be empty!
> >
> > BTW, Nolock is a choice of locking protocol (lock module), not a filesystem mode of any sort. Nolock is not safe in a clustered environment, but was written to support OGFS on a single node.
> >
> > As long as we maintain the current interface between ogfs module and lock modules (which I now think, with a bit of surprise, is a very good thing to do, especially for the near term), we can, with no work, continue to support nolock and memexp (and stats) lock modules, as we develop a new lock module for OpenDLM (or another one for any other locking solution that anyone wants to bring to the party).
>
> To maintain the current interface between ogfs module and lock modules,
> we need implement the deadman lock in the locking module for OpenDLM.
> And we could assign the jid and first_mount flag by using the old
> interface. It seems quite easy.
> Any comments?
>
I like that approach for now.
As part of the cluster logic restructuring I hope we come up with a new
method of assigning jid.
> [snip]
> And OGFS doesn't check other nodes' journal device in mounting time.
> It just trusts the system administrator :)
>
This too may need to change when we restructure the clustering, but it
is not relevant to this discussion.
>
> Stan
Greg
--
Greg Freemyer
-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opengfs-devel
[Kernel]
[Security]
[Bugtraq]
[Photo]
[Yosemite]
[MIPS Linux]
[ARM Linux]
[Linux Clusters]
[Linux RAID]
[Yosemite Hiking]
[Linux Resources]