[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ogfs-dev]Recovery Race conditions



The thread about deadman locks keeps bringing up new race conditions
that ogfs needs to worry about during the recovery process.

It very much sounds like we need to create a small recovery document
which at the very least tells us what the potential race conditions are.

Is this already documented somewhere?


First Pass at potential problems:

1) Single node failure:

1.a)  Journal Recovery by multiple nodes may occur  
        ogfs must address

1.b)  Lock Recovery must occur prior to journal replay
        ogfs must address

1.c)  Journal Recovery must occur prior to locks held on failed nodes
being granted those same locks
        ogfs must address

1.d)  Failed node holds lock for which no one is waiting.  After failure
and lock recovery, but prior to journal recovery, a different node may
request and be granted the lock.
         ogfs must address, lock should not be granted until after
journal replay.

1.e)  Mounting of new nodes should not occur during journal replay
        ???  (Mentioned by someone IIRC, I don't understand the problem)

1.f)  Normal FS activity should be blocked during journal replay
        ???  (Mentioned by Jeffrey Orlin, I don't understand the
problem)


2) multiple node failure

2.a) Multiple Journal Recoveries should not occur simultaneously.
        ??? (Mentioned by Jeffrey Orlin, but I don't understand the
problem)




Once we agree on what the issues are, we should document how ogfs
currently addresses the issue.

Finally we should decide if deadman locks should be used in the future
to address the above in a more generic way.

Greg
-- 
Greg Freemyer

Greg
-- 
Greg Freemyer




-------------------------------------------------------
This SF.Net email sponsored by: Free pre-built ASP.NET sites including
Data Reports, E-commerce, Portals, and Forums are available now.
Download today and enter to win an XBOX or Visual Studio .NET.
http://aspnet.click-url.com/go/psa00100003ave/direct;at.aspnet_072303_01/01
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opengfs-devel

[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux