On Wed, Nov 11, 2009 at 6:03 PM, Chris Mason <chris.mason@xxxxxxxxxx> wrote: > On Tue, Nov 10, 2009 at 02:13:10PM -0800, Sage Weil wrote: >> On Tue, 10 Nov 2009, Andrey Kuzmin wrote: >> >> > On Tue, Nov 10, 2009 at 11:12 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: >> > > Hi all, >> > > >> > > This is an alternative approach to atomic user transactions for btrfs. >> > > The old start/end ioctls suffer from some basic limitations, namely >> > > >> > > - We can't properly reserve space ahead of time to avoid ENOSPC part >> > > way through the transaction, and >> > > - The process may die (seg fault, SIGKILL) part way through the >> > > transaction. Currently when that happens the partial transaction will >> > > commit. >> > > >> > > This patch implements an ioctl that lets the application completely >> > > specify the entire transaction in a single syscall. If the process gets >> > > killed or seg faults part way through, the entire transaction will still >> > > complete. >> > > >> > > The goal is to atomically commit updates to multiple files, xattrs, >> > > directories. But this is still a file system: we don't get rollback if >> > > things go wrong. Instead, do what we can up front to make sure things >> > > will work out. And if things do go wrong, optionally prevent a partial >> > > result from reaching the disk. >> > >> > Why not snapshot respective root (doesn't work if transaction spans >> > multiple file-systems, but this doesn't look like a real-world >> > limitation), run txn against that snapshot and rollback on failure >> > instead? Snapshots are writable, cheap, and this looks like a real >> > transaction abort mechanism. >> >> Good question. :) >> >> I hadn't looked into this before, but I think the snapshots could be used >> to achieve both atomicity and rollback. If userspace uses an rw mutex to >> quiesce writes, it can make sure all transactions complete before creating >> a snapshot (commit). The problem with this currently is the create >> snapshot ioctl is relatively slow... it calls commit_transaction, which >> blocks until everything reaches disk. I think to perform well this >> approach would need a hook to start a commit and then return as soon as it >> can guarantee than any subsequent operation's start_transaction can't join >> in that commit. >> >> This may be a better way to go about this, though. Does that sound >> reasonable, Chris? > > Yes, we could do this, but I don't think it will perform very well > compared to your multi-operation ioctl. It really does depend on how > often you need to do atomic ops (my guess is very). > > Honestly you'll get better performance with a simple write-ahead log > from userland: Write-ahead logging is necessary anyway if the aim is to provide transactional semantics to an application. But, at the same time, w/o snapshot there is no synchronization between the log and file-system state. Regards, Andrey > > step1: write redo log somewhere in the FS, with enough information to > bring all the objects you're about to touch to a consistent state. > step2: fsync the log > step3: do your operations > step4: append a record to the undo log that invalidates the last log > op, or just truncate it to zero. > step5: fsync the log. > > The big advantage of the log is that you won't be tied to btrfs, but > it's two fsyncs where the big transaction framework does none. This > should allow you to turn on the fast fsync log again, but I think the > multi-operation ioctl would do that as well. > > -chris > > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
