[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ogfs-dev]Re: [Opendlm-devel] ODLM/OGFS Recovery



Hi Ben,

Thanks for your docs, it really helps a lot!

Current solution in my mind:

Once a node deads, all other nodes are notified by deadman locks. After
being notified, every node's lock module hold all lock request (put them in
a wait queue) except NOEXP lock request. After the node who replays journal
completes its work, it notifies (though a dedicate "recover_complete" lock?) all
the others that the blocked requests can continue now. And then the OGFS cluster
resumes.


Any comments?

Best Regards,
Stan

Cahill, Ben M wrote:

Hi all,

Attached please find some discussion I'm trying to write to understand
the problems with OpenDLM/OpenGFS recovery after a node dies.  Memexp
had this all integrated, but OpenDLM works quite differently.

Please take a look and comment ... I'm hoping that I'm wrong about some
of the things I wrote, especially about DLM_VALNOTVALID.  Please give me
a sanity check.

I'm thinking about using a list within the lock module to retain locks
that get granted during lock recovery, but must not be forwarded to
OpenGFS yet.

There's more that I need to write, about timing and recovery
notification ... Running the journal recovery after lock recovery is
complete, then notifying all nodes when journal recovery is done.

I'm thinking (just an idea) about using a callback from OpenDLM to the
lock module to indicate:

-- when lock recovery begins (and which node died)
-- when lock recovery ends
-- when journal recovery ends (in response to a call from the lock
module), perhaps using a new BARRIER state in the ODLM recovery state
machine, to wait until all nodes' journal recovery ("client" recovery,
more generically) had occurred.

Also wrestling with multiple filesystems ... Each must recover before
moving on.  Memexp had separate instances and storage areas for each
filesystem, but there's just one ODLM.   Arrgh.

Stan's been thinking about this also, but I haven't captured his
thoughts ... Please add/comment ... And anyone else, too.

-- Ben --

Opinions are mine, not Intel's








--
Opinions expressed are those of the author and do not represent Intel
Corporation
"gpg --recv-keys --keyserver wwwkeys.pgp.net E1390A7F"
{E1390A7F:3AD1 1B0C 2019 E183 0CFF  55E8 369A 8B75 E139 0A7F}



-------------------------------------------------------
This SF.net email is sponsored by: The Robotic Monkeys at ThinkGeek
For a limited time only, get FREE Ground shipping on all orders of $35
or more. Hurry up and shop folks, this offer expires April 30th!
http://www.thinkgeek.com/freeshipping/?cpg=12297
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-devel

[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux