[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [ogfs-dev]RE: [Opendlm-devel] ODLM/OGFS Recovery



On Fri, 2004-04-30 at 14:32, Zickus II, Don wrote:
[snip]
> From what I understand of the code, when the holder of the lock (via process or group) disappears unexpectedly (or expectedly and forgots to release the locks), will _NOT_ cause openDLM to immediately release the locks.  In fact the locks will stick around for up to 3 seconds (or 1 second with our forthcoming patch).  The reason for this is that as soon as the holder dies, its pid is put on a queue.  Later on an asynchronous thread (clm_master_loop() inside clm_main.c) will have its timer expire and check for work.  If it finds a pid then it will perform the dlm_purge().  
> To prove this you can write a quick little program that creates 100,000 resources but doesn't unlock them.  After the program is finished (it will probably take over a minute), for the next 3 seconds the system will be fairly responsive (ie ls is quick).  Then all of a sudden the machine will become extremely sluggish as it purges all the resources.  
> Of course this doesn't cover node death as it is very difficult to make those locks persistent without replicating its info on all the nodes (which kind of defeats the point of distributed).  
Thanks for your comments!
You are totally right. For client(process) failure cases, a internal
purge request with PURGE_DEAD will be issues. And only orphan/persistent
locks can survive this purge. For node fauilure cases no locks
(including orphan/persistent locks) can survive the recovery processs.
But we can work around this issue by combining LVB and DLM_VALNOTVALID
etc. 

I also prefer not to change current codes much if it can fullfill our
requirement. And it seems OpenDLM can now :) Thanks very much for these
days' good discussion on this topic!

BTW, did you notice there is a little pitfall in the mechanism of using
DLM_VALNOTVALID? That is:

After node failure event, if there is a block lock request (on a 
persistent lock resource and blocked by a client on the died node) that
requests PW or EX mode lock, and when it is granted "DLM_VALNOTVALID"
will NOT be returned. (Is that true? I got my conclusion from
"valueblock()", if I miss sth, please correct me.)

Is it a problem for you? Thanks!

Best Regards,
Stan
-- 
Opinions expressed are those of the author and do not represent Intel
Corporation
 
"gpg --recv-keys --keyserver wwwkeys.pgp.net E1390A7F"
{E1390A7F:3AD1 1B0C 2019 E183 0CFF  55E8 369A 8B75 E139 0A7F}

Attachment: signature.asc
Description: This is a digitally signed message part


[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux