Re: Atomic file data replace API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Excerpts from Olaf van der Spek's message of 2011-01-07 10:17:31 -0500:
> On Fri, Jan 7, 2011 at 4:13 PM, Chris Mason <chris.mason@xxxxxxxxxx> wrote:
> >> That's not what I asked. ;)
> >> I asked to wait until the first write (or close). That way, you don't
> >> get unintentional empty files.
> >> One step further, you don't have to keep the data in memory, you're
> >> free to write them to disk. You just wouldn't update the meta-data
> >> (yet).
> >
> > Sorry ;) Picture an application that truncates 1024 files without closing any
> > of them. ÂBasically any operation that includes the kernel waiting for
> > applications because they promise to do something soon is a denial of
> > service attack, or a really easy way to run out of memory on the box.
> 
> I'm not sure why you would run out of memory in that case.

Well, lets make sure I've got a good handle on the proposed interface:

1) fd = open(some_file, O_ATOMIC)
2) truncate(fd, 0)
3) write(fd, new data)

The semantics are that we promise not to let the truncate hit the disk
until the application does the write.

We have a few choices on how we do this:

1) Leave the disk untouched, but keep something in memory that says this
inode is really truncated

2) Record on disk that we've done our atomic truncate but it is still
pending.  We'd need some way to remove or invalidate this record after a
crash.

3) Go ahead and do the operation but don't allow the transaction to
commit until the write is done.

option #1: keep something in memory.  Well, any time we have a
requirement to pin something in memory until userland decides to do a
write, we risk oom.

option #2: disk format change.  Actually somewhat complex because if we
haven't crashed, we need to be able to read the inode in again without
invalidating the record but if we do crash, we have to invalidate the
record.  Not impossible, but not trivial.

option #3: Pin the whole transaction.  Depending on the FS this may be
impossible.  Certain operations require us to commit the transaction to
reclaim space, and we cannot allow userland to put that on hold without
deadlocking.

What most people don't realize about the crash safe filesystems is they
don't have fine grained transactions.  There is one single transaction
for all the operations done.  This is mostly because it is less complex
and much faster, but it also makes any 'pin the whole transaction' type
system unusable.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux