[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Google
  Web www.spinics.net

RE: [ogfs-dev]Stabilizing some OpenGFS corner cases



Hi Steve,

I'd vote for going ahead and applying the patch for problem #1.

Regarding problem #2, I know that the block allocation algo does some
inefficient things regarding metadata blocks, that result in the
filesystem slowly losing capacity.  For example, I have a filesystem
with just enough capacity to accommodate a tarball and an untar of the
kernel tree, plus a little slop.  I'll eventually run out of room if I
repeatedly:

-- copy the tarball into the fs
-- untar the Linux tree
-- rm the tarball and tree ("emptying" the filesystem)

Unfortunately, I can't remember exactly what mechanism created the
problem, but Stan added the ogfs_reclaim_one() function a while back to
reclaim metadata blocks, and also added some stuff to reclaim dentrys.
These are invoked by the ogfs_tool user space utility via the following
ioctls:

OGFS_SHRINK_DENTRY
OGFS_RECLAIM_ALL

We never got around to trying to reclaim any of this capacity in
real-time within the normal fs operation, without the use of ogfs_tool,
but you might want to think about that.  Or maybe try a smaller clump
when space gets tight??  Or your simple fix??  Or take a look at RH GFS
and see what they do (I haven't done that yet).  Or ?????

-- Ben --

Opinions are mine, not Intel's


> -----Original Message-----
> From: opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx 
> [mailto:opengfs-devel-admin@xxxxxxxxxxxxxxxxxxxxx] On Behalf 
> Of Steve Landherr
> Sent: Tuesday, July 27, 2004 3:18 PM
> To: opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
> Subject: [ogfs-dev]Stabilizing some OpenGFS corner cases
> 
> As I have been working with OpenGFS, I have come across a 
> several system
> crashes.  I checked in a few of the more simple fixes this 
> morning, but I
> have a couple additional fixes on which I would like feedback.
> 
> 1) OGFS_ASSERT(list_empty(&sdp->sd_log_ail),); in ogfs_shutdown_log()
> 
> An easy way to reproduce is to start "iozone -a" on an 
> OpenGFS filesystem in
> the background.  Chdir out of the OpenGFS filesystem and wait 
> 10-20 seconds.
> Kill the iozone, and unmount the filesystem immediately.  My 
> node takes the
> assert every time.
> 
> The problem is that there are dirty buffers associated with 
> transactions on
> the AIL at the time ogfs_pull_tail() is called from 
> ogfs_put_super().  This
> causes the transactions to remain on the AIL, and then 
> ogfs_shutdown_log()
> takes the assert.
> 
> My fix involves creating a new function called 
> ogfs_ail_flush(), modeled
> after ogfs_trans_check_empty(), and clear_from_ail().  This 
> function gets
> called in a loop along with ogfs_pull_tail() until the AIL is 
> empty.  Only
> then is ogfs_shutdown_log() called by ogfs_put_super().
> 
> I have attached a patch that I have been using for about a 
> month without
> problems.
> 
> 2) OGFS_ASSERT(*block != BLKALLOC_INTERNAL_NOENT,); in ogfs_blkalloc()
> 
> This assert has since been replaced with a return of -EIO, 
> but the problem
> still remains.
> 
> This happens when the filesystem is near capacity and a 
> reservation is made
> requiring both metadata and data blocks.  try_rgrp_fit() 
> reserves the data
> blocks first, then the metadata blocks.  If there are not enough free
> metadata blocks, it pulls blocks from the free data block 
> pool in groups of
> OGFS_META_CLUMP (64) until it has taken all of the free data 
> blocks.  It is
> that last partial clump that causes the problem.  Code often 
> allocates the
> metdata blocks (via ogfs_metaalloc()) before it allocates the 
> data blocks
> (via ogfs_blkalloc()).  ogfs_metaalloc() will then call 
> clump_alloc(), which
> will deplete the entire free data block pool, converting 
> blocks that were
> intended by the reservation to be used as data blocks.  
> 
> A simple fix is to disallow try_rgrp_fit() from reserving a 
> partial clump of
> metadata blocks (possibly causing reservations to fail when 
> they strictly
> should succeed).
> 
> (Credit to Shobhit Dayal for finding this problem and 
> suggesting the fix.)
> 
> I'd appreciate any feedback y'all can offer!
> 
> -steve
> --
> Steve Landherr -- steve-sf <at> chiquapin.com
> San Francisco, California
> 


-------------------------------------------------------
This SF.Net email is sponsored by BEA Weblogic Workshop
FREE Java Enterprise J2EE developer tools!
Get your free copy of BEA WebLogic Workshop 8.1 today.
http://ads.osdn.com/?ad_idG21&alloc_id040&opÌk
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.sourceforge.net/lists/listinfo/opengfs-devel


[Site Home]     [Kernel list]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [DVD Store]     [Linux Clusters]     [Linux RAID]     [Linux Resources]

Powered by Linux