Re: user transactions and ENOSPC...

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 25, 2009 at 02:10:14PM -0700, Sage Weil wrote:
> Hi everyone,
> 
> So, the btrfs user transaction ioctls work like so
> 
>  ioctl(fd, BTRFS_IOC_TRANS_START);
>  /* do many operations: write(), setxattr(), rmdir(), whatever. */
>  ioctl(fd, BTRFS_IOC_TRANS_END);    /* or close(fd); */
> 
> and allow an application to ensure some number of operations commit to 
> disk together.  Ceph's storage daemon uses this to avoid the overhead of 
> maintaining a write-ahead journal for complex updates.  I can see this 
> being useful for lots of other services too, since it can avoid all kinds 
> of (often slow) atomicity games.
> 
> But there are two problems with the user transaction ioctls as 
> implemented...
> The first is that we may get ENOSPC somewhere between START and END
> without any prior warning.  The patch below is intended to fix that by
> adding a new reservation category used only by a new TRANS_RESV_START
> ioctl.  It'll allow an application to specify the total amount of data
> it wants to write when the transaction starts, and get ENOSPC right
> away before it starts making changes.
> 
> This isn't a perfect solution: a mix of a transaction workload a regular
> workload will violate the reservations, and we can't really fix that
> without knowing whether any given write() or whatever belongs to a user
> transaction or not.
> 
> The second problem is that the application may die between START and 
> END. The current ioctls are "safe" in that the transaction handle is 
> closed when the struct file is released, so the fs won't get wedged if 
> you say segfault.  On the other hand, they're "unsafe" in that a process 
> that is killed or segfaults will result in an imcomplete transaction 
> making it to disk, which leaves the file system in an inconsistent state 
> (from the point of view of the application).

This is a pet peeve of mine - exporting file system transactions to
user space usually has these problems.

I would be quite interested in seeing the Featherstitch-style
patchgroups implemented on btrfs.  Do you think the ordering
guarantees they give would work for Ceph's storage daemon?

http://featherstitch.cs.ucla.edu/
http://lwn.net/Articles/354861/

-VAL
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux