[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[ogfs-dev]RE: [Opendlm-devel] Making progress on mounting with ODLM lockmodule

> > Stanley's removal of a call to release_mount_lock() yesterday got me
> > thinking about the internal (to lock module) MOUNT lock.  
> I'm thinking
> > right now that we don't need it; memexp used an internal 
> MOUNT lock to
> > determine whether a node was first-to-mount (see ogfs-memexp doc).

> As I understood, the mount lock is used to make sure that 
> there is only
> one node is doing the mount work in a same time. It serialize 
> all mount
> request. Please have a check.
> Best Regards,
> Stan

Yes, I've taken another look ... I think I understand things better now.
Here is my current analysis (comments welcome).  With apologies about
length of discussion:


This topic is confusing because we need to separate the overall "mount
work" into several different aspects/operations.

One of the *most* confusing aspects of this is that memexp's "MOUNT"
lock does not map directly to the "MOUNT" lock in the opendlm lock
module.  Memexp's "MOUNT" lock record was not just a simple lock; it
also contained status about "first-to-mount" and "others-may-mount".
"others-may-mount" status keeps non-first-to-mount nodes from mounting
the filesystem until the first-to-mount node has recovered *all*
journals ....

 ... The opendlm lock module uses the deadman lock mechanism as a
replacement for determining first-to-mount ("YES", if we can grab all
deadman locks immediately).  But the deadman mechanism does not, by
itself, handle "others-may-mount".  This requires a separate lock.  We
need to be told by the filesystem code (via "others_may_mount()") when
to release that lock.  I think that this is the specific reason that we
need the opendlm "MOUNT" lock.


There are two separate "mounts" going on, and two separate MOUNT locks,
when using OpenDLM with OpenGFS:

-- (first) for OpenDLM, which grabs lock #0, type LM_TYPE_MOUNT, when
when starting setup of the deadman locks.  (deadman.c,
start_deadman_lock()).  This keeps multiple nodes from simultaneously
attempting the *initial deadman setup* (do we need this protection?  Or,
is this what the lock was really designed to do?  See discussion below).

-- (second) for OpenGFS filesystem, which grabs lock #0
(OGFS_MOUNT_LOCK), type LM_TYPE_NONDISK.  (super_linux.c,
ogfs_read_super(), call to ogfs_glock_num()).  This keeps multiple nodes
from simultaneously *mounting the filesystem*.

Note that these are separate and distinct locks.  And, of course, the
deadman setup must happen before the filesystem can grab any locks at
all; opendlm must be successfully set up before OGFS can use it.

So far, these locks could be viewed as pretty darn independent, the
OpenDLM lock protecting the setup of the deadman locks (this protection
is what I was thinking was not necessary), and the OpenGFS lock
protecting the filesystem mount.

However, *in addition*, there is a consideration about supporting the
first-to-mount filesystem node.  We need to keep other nodes from
mounting until the first-to-mount has recovered *all* journals.
Otherwise, another node might get its filesystem mounted before *all*
journals have been recovered.  The OpenGFS filesystem "MOUNT" lock is
not sufficient for this ... OGFS grabs it too late, significantly after
the deadman setup has determined whether we're first-to-mount.  This
would allow another node to:

1)  Do OpenDLM deadman setup
2)  Determine that it is not first-to-mount
3)  Recover (only) its own journal
4)  Mount the filesystem before we complete all-journal recovery.

So, I think the OpenDLM mount lock has been doing double-duty, both as
the deadman setup protection, *and* the first-to-mount protection.
Actually, based on the presence of "release_mount_lock()" under
OGFS_DLM_NONFIRST, as well as in "opendlm_others_may_mount()", I think
it has really been doing the first-to-mount protection, not the deadman
setup protection (the second pass of deadman setup happens *without* the
lock being held).

The first-to-mount functionality is a requirement.  So, I'm going to
re-instate the MOUNT lock functionality, but add some comments to make
it clear what it's really doing.

Question:  Do we need to add another MOUNT lock for protecting deadman
setup all the way through the second pass?  Or, in the case of
non-first-to-mount, simply wait to release the MOUNT lock until after
the second pass?

-- Ben --

> >   But
> > we're using deadman locks to do the same.  I don't think we need the
> > (redundant/useless?) MOUNT lock, so I commented out the
> > grab_mount_lock() and release_mount_lock() implementations.  We can
> > think about that a little more before totally removing 
> those functions.
> > -- Ben --
> > 
> > 
> > 

SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
Opengfs-devel mailing list

[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux