[ogfs-dev]RE: [Opendlm-devel] ODLM/OGFS Recovery
> -----Original Message-----
> From: opendlm-devel-admin@xxxxxxxxxxxxxxxxxxxxx
> [mailto:opendlm-devel-admin@xxxxxxxxxxxxxxxxxxxxx] On Behalf
> Of Stanley Wang
> Sent: Friday, April 23, 2004 4:20 AM
> To: opendlm-devel@xxxxxxxxxxxxxxxxxxxxx
> Cc: opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: Re: [Opendlm-devel] ODLM/OGFS Recovery
>
> Hi Ben,
>
> Thanks for your docs, it really helps a lot!
>
> Current solution in my mind:
>
> Once a node deads, all other nodes are notified by deadman
> locks. After
> being notified, every node's lock module hold all lock
> request (put them in
> a wait queue) except NOEXP lock request.
That's the basic idea I've been thinking about, also, (inspired by the
work you've already done, and CA's comments), but I think there might be
some problems with timing. There's no guarantee of processing order
when doing lock recovery. The deadman lock will be granted during
ODLM's lock recovery process, but might be granted after many other
locks have been granted (and therefore mistakenly passed to OGFS). That
is, there's nothing special about the deadman lock (from OpenDLM's point
of view) that would cause it to be processed first.
CA uses the DLM_VALNOTVALID invalid LVB status as a possible earlier
indicator. I haven't convinced myself that that is early enough, or
conclusive enough (discussion was in the attachment, "false
positive/negative"), to serve as the trigger for withholding locks from
OGFS (but it might be, and I might not understand it well enough!).
Plus, it forces us to use LVBs with every lock (maybe not a huge
problem, but nice to avoid if possible). And, it prevents us from using
the LKM_INVVALBLK flag! (but I don't think there's a need to, so not a
problem).
The other tricky part is, if we withhold *all* locks from OpenGFS (not
just the ones that were blocked by the dead node), will surviving nodes
be able to finish their write transactions (and release their CR
transaction locks)? They need to do this, or else the node that
attempts the journal recovery will never get the (EX) transaction lock
....
.... Again (false positive/negative), how do we conclusively tell which
locks were blocked by dead node vs. not?
> After the node who
> replays journal
> completes its work, it notifies (though a dedicate "recover_complete"
> lock?) all
> the others that the blocked requests can continue now. And
> then the OGFS
> cluster
> resumes.
Yes, "recover_complete" locks might work. The good thing about this is
that each lock would be filesystem-specific (a concern if the user
mounts multiple OGFS filesystems). I think that each lock would need to
be node- or journal-specific as well (multiple nodes might need to
recover multiple journals before continuing operation). I'd rather use
locks than the callbacks, etc., ... I'll keep working on this.
More comments/suggestions from anyone????
-- Ben --
Opinions are mine, not Intel's
>
> Any comments?
>
> Best Regards,
> Stan
>
> Cahill, Ben M wrote:
>
> >Hi all,
> >
> >Attached please find some discussion I'm trying to write to
> understand
> >the problems with OpenDLM/OpenGFS recovery after a node dies. Memexp
> >had this all integrated, but OpenDLM works quite differently.
> >
> >Please take a look and comment ... I'm hoping that I'm wrong
> about some
> >of the things I wrote, especially about DLM_VALNOTVALID.
> Please give me
> >a sanity check.
> >
> >I'm thinking about using a list within the lock module to
> retain locks
> >that get granted during lock recovery, but must not be forwarded to
> >OpenGFS yet.
> >
> >There's more that I need to write, about timing and recovery
> >notification ... Running the journal recovery after lock recovery is
> >complete, then notifying all nodes when journal recovery is done.
> >
> >I'm thinking (just an idea) about using a callback from
> OpenDLM to the
> >lock module to indicate:
> >
> >-- when lock recovery begins (and which node died)
> >-- when lock recovery ends
> >-- when journal recovery ends (in response to a call from the lock
> >module), perhaps using a new BARRIER state in the ODLM recovery state
> >machine, to wait until all nodes' journal recovery ("client"
> recovery,
> >more generically) had occurred.
> >
> >Also wrestling with multiple filesystems ... Each must recover before
> >moving on. Memexp had separate instances and storage areas for each
> >filesystem, but there's just one ODLM. Arrgh.
> >
> >Stan's been thinking about this also, but I haven't captured his
> >thoughts ... Please add/comment ... And anyone else, too.
> >
> >-- Ben --
> >
> >Opinions are mine, not Intel's
> >
-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg297
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-devel
[Kernel]
[Security]
[Bugtraq]
[Photo]
[Yosemite]
[MIPS Linux]
[ARM Linux]
[Linux Clusters]
[Linux RAID]
[Yosemite Hiking]
[Linux Resources]