Re: [RFC] big fat transaction ioctl

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Nov 11, 2009 at 6:03 PM, Chris Mason <chris.mason@xxxxxxxxxx> wrote:
> On Tue, Nov 10, 2009 at 02:13:10PM -0800, Sage Weil wrote:
>> On Tue, 10 Nov 2009, Andrey Kuzmin wrote:
>>
>> > On Tue, Nov 10, 2009 at 11:12 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
>> > > Hi all,
>> > >
>> > > This is an alternative approach to atomic user transactions for btrfs.
>> > > The old start/end ioctls suffer from some basic limitations, namely
>> > >
>> > >  - We can't properly reserve space ahead of time to avoid ENOSPC part
>> > > way through the transaction, and
>> > >  - The process may die (seg fault, SIGKILL) part way through the
>> > > transaction.  Currently when that happens the partial transaction will
>> > > commit.
>> > >
>> > > This patch implements an ioctl that lets the application completely
>> > > specify the entire transaction in a single syscall.  If the process gets
>> > > killed or seg faults part way through, the entire transaction will still
>> > > complete.
>> > >
>> > > The goal is to atomically commit updates to multiple files, xattrs,
>> > > directories.  But this is still a file system: we don't get rollback if
>> > > things go wrong.  Instead, do what we can up front to make sure things
>> > > will work out.  And if things do go wrong, optionally prevent a partial
>> > > result from reaching the disk.
>> >
>> > Why not snapshot respective root (doesn't work if transaction spans
>> > multiple file-systems, but this doesn't look like a real-world
>> > limitation), run txn against that snapshot and rollback on failure
>> > instead? Snapshots are writable, cheap, and this looks like a real
>> > transaction abort mechanism.
>>
>> Good question.  :)
>>
>> I hadn't looked into this before, but I think the snapshots could be used
>> to achieve both atomicity and rollback.  If userspace uses an rw mutex to
>> quiesce writes, it can make sure all transactions complete before creating
>> a snapshot (commit).  The problem with this currently is the create
>> snapshot ioctl is relatively slow... it calls commit_transaction, which
>> blocks until everything reaches disk.  I think to perform well this
>> approach would need a hook to start a commit and then return as soon as it
>> can guarantee than any subsequent operation's start_transaction can't join
>> in that commit.
>>
>> This may be a better way to go about this, though.  Does that sound
>> reasonable, Chris?
>
> Yes, we could do this, but I don't think it will perform very well
> compared to your multi-operation ioctl.  It really does depend on how
> often you need to do atomic ops (my guess is very).
>
> Honestly you'll get better performance with a simple write-ahead log
> from userland:

Write-ahead logging is necessary anyway if the aim is to provide
transactional semantics to an application. But, at the same time, w/o
snapshot there is no synchronization between the log and file-system
state.

Regards,
Andrey

>
> step1: write redo log somewhere in the FS, with enough information to
> bring all the objects you're about to touch to a consistent state.
> step2: fsync the log
> step3: do your operations
> step4: append a record to the undo log that invalidates the last log
> op, or just truncate it to zero.
> step5: fsync the log.
>
> The big advantage of the log is that you won't be tied to btrfs, but
> it's two fsyncs where the big transaction framework does none.  This
> should allow you to turn on the fast fsync log again, but I think the
> multi-operation ioctl would do that as well.
>
> -chris
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux