Re: Atomicity or the ext4 open-write-close-rename debacle

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2009-04-08 at 10:21 -0500, Patrick Goetz wrote:
> Hi -
> 
> I've been trying to get up to speed on new linux filesystem efforts and 
> stumbled upon the following post from a btrfs developer to lwn.net:
> 
> -----------------------------------------------------------
> Posted Mar 16, 2009 16:50 UTC (Mon) by masoncl (subscriber, #47138)
> The btrfs data=ordered implementation is different from ext34 and 
> reiserfs. It decouples data writes from the metadata transaction, and 
> simply updates the metadata for file extents after the data blocks are 
> on disk.
> 
> This means the transaction commit doesn't have to wait for the data 
> blocks because the metadata for the file extents always reflects extents 
> that are actually on disk.
> 
> When you rename one file over another, the destination file is 
> atomically replaced with the new file. The new file is fully consistent 
> with the data that has already been written, which in the worst case 
> means it has a size of zero after a crash.
> ...
> -----------------------------------------------------------
> 
> Frankly this comment doesn't make any sense to me at all.  First of all, 
> "this means the transaction commit doesn't have to wait for the data 
> blocks...".  Is the data ordered or not? If you commit the transaction 
> -- i.e. update the metadata before the data blocks are committed -- then 
> the operations are occurring out of order and ext4 
> open-write-close-rename mayhem ensues.
> 
> Second, atomicity in this context means that when executing a rename, 
> you always get either the old data (exactly) or the new data (exactly) 
> even after a crash. The "worst case scenario" described above -- a size 
> of zero after crash -- precisely violates atomicity.
> 
> Any comments on this?

There isn't a quick and short description for this.  Before 2.6.30,
btrfs would allow renames to result in zero length files after a crash.
Filesystem developers have always considered the rename-is-atomic
requirement to refer only to the directory entries themselves.

With 2.6.30, extra ordering is added to btrfs, making sure that metadata
and data are both atomically replaced during a rename.  In other words,
for renames it will work like ext3 data=ordered mode.

-chris


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux