[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [ogfs-dev]The problems with resource group locking



Great info . . . will you add this to the locking doc?

Also, what exactly is a "transaction", and "completing a transaction"?

-- Ben --

Opinions are mine, not Intel's

> -----Original Message-----
> From: Dominik Vogt [mailto:opengfs-devel@lists.sourceforge.net]
> Sent: Thursday, April 17, 2003 9:27 AM
> To: opengfs-devel@lists.sourceforge.net
> Subject: [ogfs-dev]The problems with resource group locking
> 
> 
> I have done more investigations on the resource group locking and
> deadlock issues.
> 
> Fact 1:
>   Resource groups need to be locked only to allocate or deallocate
>   blocks.  It is not necessary to lock the rg just to modify an
>   inode or data block.
> 
> Fact 2:
>   There potentially are deadlocks if two or more resource groups
>   are locked in random order.
> 
> Fact3:
>   When a new directory entry is created, the hash table of the
>   directory might grow and additional blocks have to be allocated.
> 
> Fact 4:
>   When any data or meta data is allocated, the resource groups are
>   locked one by one until one with enough space is found.  This
>   can cause *lots* of inter node locks when the file system
>   becomes full.
> 
> Now let us see how many resource groups are locked by the various
> operation.
> 
> a) Modifying data blocks or inodes
>    No rg locks required.
>    
> b) Allocating an inode (mkdir(), create(), link(), symlink())
>    Creates a new directory entry in the parent directory.  The code
>    currently requires that if the directory grows (and thus needs
>    new meta data blocks), the whole hash table plus any other new
>    blocks are moved to/allocated in the *same* resource group as
>    the new inode.  This prevents that multiple rgs have to be
>    locked.  It may not seem such a big limitation, but the current
>    code tries to reserver enough space in that rg for the worst
>    case of directory growth (hash table is created and immediately
>    explodes to maximum size).  In other words:  in order to create
>    a new inode, the target resource group must have about 1 MB of
>    free data plus meta data blocks.
> 
> c) Deallocating an inode
>    Locks the inode's rg to update the block bitmap.  Since ogfs
>    never frees the space that is now unused in directories, the
>    dir's rg is *not* locked.
> 
> d) Allocating file data / write()
>    Only one rg is locked.  A single ogfs_write() call never writes
>    to more than a single resource group.  This is an inacceptable
>    limitation of the write() system call.
> 
> e) Truncating a file
>    Needs many rg locks but sorts them.
> 
> f) Removing a file or directory / unlink()
>    Is done in two steps.  In the first step, the directory entry
>    is removed (no rg locks required).  The inode is scheduled for
>    removal are listed in the log and their blocks are freed only
>    after the transaction has been completed.  This second stage
>    needs to truncate the file and remove the inode, soring the 
>    corresponding rgs before locking them.  (This description may
>    be a bit inaccurate).
> 
> g) Renaming plain files / rename()
>    Needs one rg lock (see (f)).
> 
> h) Renaming directories / rename()
>    Needs one rg lock (see (f)). In addition, another lock
>    serializes directory renaming operations.
> 
> i) statfs()
>    Locks all resource groups in order.
> 
> j) mmap() shared writable
>    Would need many rg locks which could be ordered.  Not
>    implemented since it would lock large parts of the file system
>    for possibly long times.
> 
> k) flock()
>    Does not need any rg locks.  By the way, it prevents non
>    locking file access by other processes.  Is that allowed by the
>    specs?
> 
> (did I forget anything?)
> 
> Summary
> 
> In the current code, rg deadlocks are not possible, at least not
> with above operations.  But the price one pays is high:
> 
>  - write() never writes more data than fitting into the rg with
>    the most free space.
>  - Inodes can be created only in resource groups that have about
>    1 MB of free space.
>  - Once allocated, empty directory blocks are never freed.
>  - A directory hash table is never shrunk.
>  - Meta data blocks are never converted back to data blocks.
>  - When a directory hash table grows it is copied to the same rg
>    as the new inode en bloc.
>  - When a new directory leaf is allocated, it is created in the
>    same rg as the new inode.  This has the potentiall to scatter
>    the directory leaves all over the file system.
> 
> Bye
> 
> Dominik ^_^  ^_^
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Opengfs-devel mailing list
> Opengfs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opengfs-devel
> 


-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opengfs-devel

[Kernel]     [Security]     [Bugtraq]     [Photo]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Clusters]     [Linux RAID]     [Yosemite Hiking]     [Linux Resources]

Powered by Linux