RE: [ogfs-dev]Stabilizing some OpenGFS corner cases
Hi Steve,
I'd vote for going ahead and applying the patch for problem #1.
Regarding problem #2, I know that the block allocation algo does some
inefficient things regarding metadata blocks, that result in the
filesystem slowly losing capacity. For example, I have a filesystem
with just enough capacity to accommodate a tarball and an untar of the
kernel tree, plus a little slop. I'll eventually run out of room if I
repeatedly:
-- copy the tarball into the fs
-- untar the Linux tree
-- rm the tarball and tree ("emptying" the filesystem)
Unfortunately, I can't remember exactly what mechanism created the
problem, but Stan added the ogfs_reclaim_one() function a while back to
reclaim metadata blocks, and also added some stuff to reclaim dentrys.
These are invoked by the ogfs_tool user space utility via the following
ioctls:
OGFS_SHRINK_DENTRY
OGFS_RECLAIM_ALL
We never got around to trying to reclaim any of this capacity in
real-time within the normal fs operation, without the use of ogfs_tool,
but you might want to think about that. Or maybe try a smaller clump
when space gets tight?? Or your simple fix?? Or take a look at RH GFS
and see what they do (I haven't done that yet). Or ?????
-- Ben --
Opinions are mine, not Intel's
> -----Original Message-----
> From: opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx
> [mailto:opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx] On Behalf
> Of Steve Landherr
> Sent: Tuesday, July 27, 2004 3:18 PM
> To: opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: [ogfs-dev]Stabilizing some OpenGFS corner cases
>
> As I have been working with OpenGFS, I have come across a
> several system
> crashes. I checked in a few of the more simple fixes this
> morning, but I
> have a couple additional fixes on which I would like feedback.
>
> 1) OGFS_ASSERT(list_empty(&sdp->sd_log_ail),); in ogfs_shutdown_log()
>
> An easy way to reproduce is to start "iozone -a" on an
> OpenGFS filesystem in
> the background. Chdir out of the OpenGFS filesystem and wait
> 10-20 seconds.
> Kill the iozone, and unmount the filesystem immediately. My
> node takes the
> assert every time.
>
> The problem is that there are dirty buffers associated with
> transactions on
> the AIL at the time ogfs_pull_tail() is called from
> ogfs_put_super(). This
> causes the transactions to remain on the AIL, and then
> ogfs_shutdown_log()
> takes the assert.
>
> My fix involves creating a new function called
> ogfs_ail_flush(), modeled
> after ogfs_trans_check_empty(), and clear_from_ail(). This
> function gets
> called in a loop along with ogfs_pull_tail() until the AIL is
> empty. Only
> then is ogfs_shutdown_log() called by ogfs_put_super().
>
> I have attached a patch that I have been using for about a
> month without
> problems.
>
> 2) OGFS_ASSERT(*block != BLKALLOC_INTERNAL_NOENT,); in ogfs_blkalloc()
>
> This assert has since been replaced with a return of -EIO,
> but the problem
> still remains.
>
> This happens when the filesystem is near capacity and a
> reservation is made
> requiring both metadata and data blocks. try_rgrp_fit()
> reserves the data
> blocks first, then the metadata blocks. If there are not enough free
> metadata blocks, it pulls blocks from the free data block
> pool in groups of
> OGFS_META_CLUMP (64) until it has taken all of the free data
> blocks. It is
> that last partial clump that causes the problem. Code often
> allocates the
> metdata blocks (via ogfs_metaalloc()) before it allocates the
> data blocks
> (via ogfs_blkalloc()). ogfs_metaalloc() will then call
> clump_alloc(), which
> will deplete the entire free data block pool, converting
> blocks that were
> intended by the reservation to be used as data blocks.
>
> A simple fix is to disallow try_rgrp_fit() from reserving a
> partial clump of
> metadata blocks (possibly causing reservations to fail when
> they strictly
> should succeed).
>
> (Credit to Shobhit Dayal for finding this problem and
> suggesting the fix.)
>
> I'd appreciate any feedback y'all can offer!
>
> -steve
> --
> Steve Landherr -- steve-sf <at> chiquapin.com
> San Francisco, California
>
-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_idG21&alloc_id040&opÌk
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-devel
[Site Home]
[Kernel list]
[Security]
[Bugtraq]
[Photo]
[Yosemite]
[MIPS Linux]
[ARM Linux]
[DVD Store]
[Linux Clusters]
[Linux RAID]
[Linux Resources]