Hi all,
Attached please find some discussion I'm trying to write to understand
the problems with OpenDLM/OpenGFS recovery after a node dies. Memexp
had this all integrated, but OpenDLM works quite differently.
Please take a look and comment ... I'm hoping that I'm wrong about some
of the things I wrote, especially about DLM_VALNOTVALID. Please give me
a sanity check.
I'm thinking about using a list within the lock module to retain locks
that get granted during lock recovery, but must not be forwarded to
OpenGFS yet.
There's more that I need to write, about timing and recovery
notification ... Running the journal recovery after lock recovery is
complete, then notifying all nodes when journal recovery is done.
I'm thinking (just an idea) about using a callback from OpenDLM to the
lock module to indicate:
-- when lock recovery begins (and which node died)
-- when lock recovery ends
-- when journal recovery ends (in response to a call from the lock
module), perhaps using a new BARRIER state in the ODLM recovery state
machine, to wait until all nodes' journal recovery ("client" recovery,
more generically) had occurred.
Also wrestling with multiple filesystems ... Each must recover before
moving on. Memexp had separate instances and storage areas for each
filesystem, but there's just one ODLM. Arrgh.
Stan's been thinking about this also, but I haven't captured his
thoughts ... Please add/comment ... And anyone else, too.
-- Ben --
Opinions are mine, not Intel's