RE: [ogfs-dev]The problems with resource group locking
Great info . . . will you add this to the locking doc?
Also, what exactly is a "transaction", and "completing a transaction"?
-- Ben --
Opinions are mine, not Intel's
> -----Original Message-----
> From: Dominik Vogt [mailto:opengfs-devel@lists.sourceforge.net]
> Sent: Thursday, April 17, 2003 9:27 AM
> To: opengfs-devel@lists.sourceforge.net
> Subject: [ogfs-dev]The problems with resource group locking
>
>
> I have done more investigations on the resource group locking and
> deadlock issues.
>
> Fact 1:
> Resource groups need to be locked only to allocate or deallocate
> blocks. It is not necessary to lock the rg just to modify an
> inode or data block.
>
> Fact 2:
> There potentially are deadlocks if two or more resource groups
> are locked in random order.
>
> Fact3:
> When a new directory entry is created, the hash table of the
> directory might grow and additional blocks have to be allocated.
>
> Fact 4:
> When any data or meta data is allocated, the resource groups are
> locked one by one until one with enough space is found. This
> can cause *lots* of inter node locks when the file system
> becomes full.
>
> Now let us see how many resource groups are locked by the various
> operation.
>
> a) Modifying data blocks or inodes
> No rg locks required.
>
> b) Allocating an inode (mkdir(), create(), link(), symlink())
> Creates a new directory entry in the parent directory. The code
> currently requires that if the directory grows (and thus needs
> new meta data blocks), the whole hash table plus any other new
> blocks are moved to/allocated in the *same* resource group as
> the new inode. This prevents that multiple rgs have to be
> locked. It may not seem such a big limitation, but the current
> code tries to reserver enough space in that rg for the worst
> case of directory growth (hash table is created and immediately
> explodes to maximum size). In other words: in order to create
> a new inode, the target resource group must have about 1 MB of
> free data plus meta data blocks.
>
> c) Deallocating an inode
> Locks the inode's rg to update the block bitmap. Since ogfs
> never frees the space that is now unused in directories, the
> dir's rg is *not* locked.
>
> d) Allocating file data / write()
> Only one rg is locked. A single ogfs_write() call never writes
> to more than a single resource group. This is an inacceptable
> limitation of the write() system call.
>
> e) Truncating a file
> Needs many rg locks but sorts them.
>
> f) Removing a file or directory / unlink()
> Is done in two steps. In the first step, the directory entry
> is removed (no rg locks required). The inode is scheduled for
> removal are listed in the log and their blocks are freed only
> after the transaction has been completed. This second stage
> needs to truncate the file and remove the inode, soring the
> corresponding rgs before locking them. (This description may
> be a bit inaccurate).
>
> g) Renaming plain files / rename()
> Needs one rg lock (see (f)).
>
> h) Renaming directories / rename()
> Needs one rg lock (see (f)). In addition, another lock
> serializes directory renaming operations.
>
> i) statfs()
> Locks all resource groups in order.
>
> j) mmap() shared writable
> Would need many rg locks which could be ordered. Not
> implemented since it would lock large parts of the file system
> for possibly long times.
>
> k) flock()
> Does not need any rg locks. By the way, it prevents non
> locking file access by other processes. Is that allowed by the
> specs?
>
> (did I forget anything?)
>
> Summary
>
> In the current code, rg deadlocks are not possible, at least not
> with above operations. But the price one pays is high:
>
> - write() never writes more data than fitting into the rg with
> the most free space.
> - Inodes can be created only in resource groups that have about
> 1 MB of free space.
> - Once allocated, empty directory blocks are never freed.
> - A directory hash table is never shrunk.
> - Meta data blocks are never converted back to data blocks.
> - When a directory hash table grows it is copied to the same rg
> as the new inode en bloc.
> - When a new directory leaf is allocated, it is created in the
> same rg as the new inode. This has the potentiall to scatter
> the directory leaves all over the file system.
>
> Bye
>
> Dominik ^_^ ^_^
>
>
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Opengfs-devel mailing list
> Opengfs-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/opengfs-devel
>
-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
Opengfs-devel mailing list
Opengfs-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/opengfs-devel
[Kernel]
[Security]
[Bugtraq]
[Photo]
[Yosemite]
[MIPS Linux]
[ARM Linux]
[Linux Clusters]
[Linux RAID]
[Yosemite Hiking]
[Linux Resources]