[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [ogfs-dev]RE: [Opendlm-devel] ODLM/OGFS Recovery

Daniel McNeil wrote:


This is the same question I had many years ago when I first heard about
persistent locking.  As I have said before, a particular implementation
of persistent locking was used to implement a cluster file system.

From what I remember (It has been more than 4 years since I worked
on this).  As it was explained to me, the whole point of persistent
locks was that the application did not want to keep a bunch of locks
open to be able to find out that some event has occurred which would
require an application recovery before using the lock.  Previously,
the application would keep the locks open and if it got a invalid
value block back, it would know that a recovery is required.  With
"persistent" locks, the application did not have to keep the lock
open, but would still get the invalid value block if process holding
the lock died or a node died.

Re-reading the trucluster man pages I pointed to before (like this one):

This implementation seems much more complicated than what I remember.
This looks like this DLM does keep persistent locks around if they
are invalid after the process closes them (and they get marked invalid
if a node dies).  Thus, a new lock request will see the invalid lock
even when all previous users have gone away.  I'm guessing that is
what is meant by persistent.

The more simple implementation approach I remember (as best as I can --
at least for the node death case anyway -- and it really isn't that
simple).  If a node dies, then all new persistent locks would return
invalid until the dlm_rd_validate() is made.  Also, existing dlm locks
could get marked invalid during dlm recovery.

This implementation was then used to implement CFS. Recovery went
something like:
running nomally
node dies
DLM recovery runs - non-persistent locks recover
persistent locks return invalid
deadman locks from dead node(s) for each file system are granted
start file system recovery:
surviving nodes pick one to run log replay
id = dlm_attach()
replay file system log
if validate(id) == MORE_RECOVERY
re-start file system recovery
since there's been another death
dlm_validate(id) - persistent locks valid now
other surviving nodes wait for replay
file system recovery complete

if another node dies in the middle of file system recovery
abort current recovery and start over

This is simpler because persistent locks returned invalid based on
whether a node died until the validate call was made.  Every node
in the cluster did not have to replicated each lock information.

The use of persistent locks prevented surviving nodes from accessing
metadata BEFORE file system log replay had finished.  We used separate
lock domains for each file system, so file system recovery happened in

Sorry for rambling, but I think I answered the question in there

Thanks a lot for your help!

Persistent locks in trucluster can survive node cases if it is attached to a "recovery domain". Although the persistent locks are open to all the other nodes after one node death, "DLM_xxxxVALNOTVALID" could be used as indicator of node failure. And as I understood, there are some differences in OpenDLM:
1. No "recovery domain" support
2. After node failure event, if the new granted lock request (on a persistent lock resource) is PW or EX, "DLM_VALNOTVALID" will NOT be returned.
(Is that true? I got my conclusion from "valueblock()", if I miss sth, please correct me.)
3. There is not a dedicated API to re-validate lock value block, if the requested mode is PW or EX and LKM_VALBLK is specified, the LVB becomes valid again.

And it seems the second issues in previous list is crucial for ODLM/OGFS Recovery. Change the codes is easy, but I'm afraid it will break other's work. Any comments? Especially Don :)

Best Reagrds,

Opinions expressed are those of the author and do not represent Intel
"gpg --recv-keys --keyserver wwwkeys.pgp.net E1390A7F"
{E1390A7F:3AD1 1B0C 2019 E183 0CFF  55E8 369A 8B75 E139 0A7F}

This SF.Net email is sponsored by: Oracle 10g
Get certified on the hottest thing ever to hit the market... Oracle 10g. Take an Oracle 10g class now, and we'll give you the exam FREE.
Opengfs-devel mailing list

[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux