[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [ogfs-dev]RE: [Opendlm-devel] Making progress on mounting with ODLM lockmodule



Just adding a few more comments, see below ...

-- Ben -- 


> > 
> > > Stanley's removal of a call to release_mount_lock() 
> yesterday got me
> > > thinking about the internal (to lock module) MOUNT lock.  
> > I'm thinking
> > > right now that we don't need it; memexp used an internal 
> > MOUNT lock to
> > > determine whether a node was first-to-mount (see ogfs-memexp doc).
> 
> > As I understood, the mount lock is used to make sure that 
> > there is only
> > one node is doing the mount work in a same time. It serialize 
> > all mount
> > request. Please have a check.
> > 
> > Best Regards,
> > Stan
> 
> Yes, I've taken another look ... I think I understand things 
> better now.
> Here is my current analysis (comments welcome).  With apologies about
> length of discussion:
> 
> Summary:
> 
> This topic is confusing because we need to separate the overall "mount
> work" into several different aspects/operations.
> 
> One of the *most* confusing aspects of this is that memexp's "MOUNT"
> lock does not map directly to the "MOUNT" lock in the opendlm lock
> module.  Memexp's "MOUNT" lock record was not just a simple lock; it
> also contained status about "first-to-mount" and "others-may-mount".
> "others-may-mount" status keeps non-first-to-mount nodes from mounting
> the filesystem until the first-to-mount node has recovered *all*
> journals ....
> 
>  ... The opendlm lock module uses the deadman lock mechanism as a
> replacement for determining first-to-mount ("YES", if we can grab all
> deadman locks immediately).  But the deadman mechanism does not, by
> itself, handle "others-may-mount".  This requires a separate lock.  We
> need to be told by the filesystem code (via "others_may_mount()") when
> to release that lock.  I think that this is the specific 
> reason that we
> need the opendlm "MOUNT" lock.
> 
> Details:
> 
> There are two separate "mounts" going on, and two separate 
> MOUNT locks,
> when using OpenDLM with OpenGFS:
> 
> -- (first) for OpenDLM, which grabs lock #0, type LM_TYPE_MOUNT, when
> when starting setup of the deadman locks.  (deadman.c,
> start_deadman_lock()).  This keeps multiple nodes from simultaneously
> attempting the *initial deadman setup* (do we need this 
> protection?

YES, we do ... If two nodes tried the initial deadman setup at exactly
the same time, they would both find out that they are *not*
first-to-mount.  They would each be able to grab their own deadman EX,
but not be able to grab each others' deadman EX, therefore both would
conclude that they're not first-to-mount.

>  Or,
> is this what the lock was really designed to do?  See 
> discussion below).
> 
> -- (second) for OpenGFS filesystem, which grabs lock #0
> (OGFS_MOUNT_LOCK), type LM_TYPE_NONDISK.  (super_linux.c,
> ogfs_read_super(), call to ogfs_glock_num()).  This keeps 
> multiple nodes
> from simultaneously *mounting the filesystem*.
> 
> Note that these are separate and distinct locks.  And, of course, the
> deadman setup must happen before the filesystem can grab any locks at
> all; opendlm must be successfully set up before OGFS can use it.
> 
> So far, these locks could be viewed as pretty darn independent, the
> OpenDLM lock protecting the setup of the deadman locks (this 
> protection
> is what I was thinking was not necessary), and the OpenGFS lock
> protecting the filesystem mount.
> 
> However, *in addition*, there is a consideration about supporting the
> first-to-mount filesystem node.  We need to keep other nodes from
> mounting until the first-to-mount has recovered *all* journals.
> Otherwise, another node might get its filesystem mounted before *all*
> journals have been recovered.  The OpenGFS filesystem "MOUNT" lock is
> not sufficient for this ... OGFS grabs it too late, 
> significantly after
> the deadman setup has determined whether we're first-to-mount.  This
> would allow another node to:
> 
> 1)  Do OpenDLM deadman setup
> 2)  Determine that it is not first-to-mount
> 3)  Recover (only) its own journal
> 4)  Mount the filesystem before we complete all-journal recovery.
> 
> So, I think the OpenDLM mount lock has been doing double-duty, both as
> the deadman setup protection, *and* the first-to-mount protection.
> Actually, based on the presence of "release_mount_lock()" under
> OGFS_DLM_NONFIRST, as well as in "opendlm_others_may_mount()", I think
> it has really been doing the first-to-mount protection, not 
> the deadman
> setup protection (the second pass of deadman setup happens 
> *without* the
> lock being held).
> 
> The first-to-mount functionality is a requirement.  So, I'm going to
> re-instate the MOUNT lock functionality, but add some comments to make
> it clear what it's really doing.

Including the first-to-mount detection.

> 
> Question:  Do we need to add another MOUNT lock for protecting deadman
> setup all the way through the second pass?  Or, in the case of
> non-first-to-mount, simply wait to release the MOUNT lock until after
> the second pass?

Still not sure if we need the mount lock all the way through the second
pass, but it would make code a little cleaner if we grabbed and released
the lock (in cases of error or *not* first-to-mount) in opendlm_mount(),
just before/after the call to start_deadman_lock().  Doing so would
protect the second pass, and put all grab/release calls in the same
dlm.c file, close to one another:

	error = grab_mount_lock()
	if (error) { ... }

	error = start_deadman_lock(dlm, first)
	if (error < 0) {
		release_mount_lock();
		...
	}

	if (!(*first)) {
		release_mount_lock();
	}


Any danger in wrapping the second pass with the mount lock??



> 
> -- Ben --
> 
> 
> 
> > 
> > >   But
> > > we're using deadman locks to do the same.  I don't think 
> we need the
> > > (redundant/useless?) MOUNT lock, so I commented out the
> > > grab_mount_lock() and release_mount_lock() 
> implementations.  We can
> > > think about that a little more before totally removing 
> > those functions.
> > 
> > > -- Ben --
> > > 
> > > 
> > > 
> 
> 
> -------------------------------------------------------
> SF.Net is sponsored by: Speed Start Your Linux Apps Now.
> Build and deploy apps & Web services for Linux with
> a free DVD software kit from IBM. Click Now!
> http://ads.osdn.com/?ad_id56&alloc_id438&op=ick
> _______________________________________________
> Opengfs-devel mailing list
> Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> https://lists.sourceforge.net/lists/listinfo/opengfs-devel
> 


-------------------------------------------------------
SF.Net is sponsored by: Speed Start Your Linux Apps Now.
Build and deploy apps & Web services for Linux with
a free DVD software kit from IBM. Click Now!
http://ads.osdn.com/?ad_id56&alloc_id438&opÌk
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-devel


[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux