On Sat, 13 Feb 2010, Mike Fedyk wrote:
> On Sat, Feb 13, 2010 at 3:25 AM, Sander <sander@xxxxxxxxxxx> wrote:
> > Mike Fedyk wrote (ao):
> >> On Fri, Feb 12, 2010 at 8:32 AM, Josef Bacik <josef@xxxxxxxxxx> wrote:
> >> > Creating a file is a metadata operation, and _any_ metadata operation has to be
> >> > committed to disk when the transaction commits in order to maintain a coherent
> >> > fs. ??Thanks,
> >>
> >> What I still don't understand though is that the create could have
> >> taken up to 30 seconds to commit and the same for the few bytes of
> >> data, but a few ms later a snapshot was made and the metadata change
> >> was there and the data change was not. Could it have happened that
> >> the snapshot would not have the newly created file and this was just a
> >> timing issue that should not be relied upon?
> >>
> >> I'm just wondering why that file was there at all.
> >
> > I would say that is because the moment the file got created, the
> > resulting metadata was commited immediately. The data not yet.
>
> Josef explained it to me on IRC. Meta-data changes like file creation
> get added to the current transaction and snapshots start a new
> transaction so that is why the empty file is in the snapshot.
>
> The file is empty because with delayed allocation, the data has not
> hit the filesystem yet and thus has no representation in filesystem
> operations like snapshots.
You can make btrfs include the file data in the snapshot along with the
metadata with the 'flushoncommit' mount option. The problem is that this
will make _all_ btrfs commits more expensive, as they'll block new
operations during the commit while old data is being flushed out.
We could trivially make this happen only when there is a new snapshot, to
get the behavior you expect (see patch below). If the goal is to make a
perfectly consistent snapshot of the file system, this is better than
sync ; btrfsctl -s snap whatever
because there wouldn't be a window where metadata changes make it into the
snapshot but file data does not.
Is there really a use case for the sort of 'lazy' snapshots with
out-of-sync data and metadata (like 0-byte files)? If so, we should add
another ioctl for a full-blown snapshot so that users who _do_ want a
fully consistent snapshot can get it.
If not, something like the below should be sufficient to make all
snapshots fully consistent...
sage
---
From: Sage Weil <sage@xxxxxxxxxxxx>
Date: Fri, 19 Feb 2010 14:13:50 -0800
Subject: [PATCH] Btrfs: flush data on snapshot creation
Flush any delalloc extents when we create a snapshot, so that recently
written file data is always included in the snapshot.
Signed-off-by: Sage Weil <sage@xxxxxxxxxxxx>
---
fs/btrfs/transaction.c | 5 +----
1 files changed, 1 insertions(+), 4 deletions(-)
diff --git a/fs/btrfs/transaction.c b/fs/btrfs/transaction.c
index e83d4e1..f5b7029 100644
--- a/fs/btrfs/transaction.c
+++ b/fs/btrfs/transaction.c
@@ -1084,13 +1084,10 @@ int btrfs_commit_transaction(struct btrfs_trans_handle *trans,
mutex_unlock(&root->fs_info->trans_mutex);
- if (flush_on_commit) {
+ if (flush_on_commit || snap_pending) {
btrfs_start_delalloc_inodes(root, 1);
ret = btrfs_wait_ordered_extents(root, 0, 1);
BUG_ON(ret);
- } else if (snap_pending) {
- ret = btrfs_wait_ordered_extents(root, 0, 1);
- BUG_ON(ret);
}
/*
--
1.6.6.1