On 2019/4/26 上午11:32, robbieko wrote:
[snipped]
>>
>> But now, before creating new snapshot, we will flush all data.
>>
>> This means at this time, the data will be written back *NOCOW*, just as
>> expected.
>>
>>> will_be_snapshotted will be set, and then flush
>>
>> will_be_snapshotted will also be gone.
>>
>> Thanks,
>> Qu
>>
>
> Why will will_be_snapshotted disappear?
>
> If will_be_snapshotted is removed, when volume is full,
> how do we ensure that snapshot and write are both
> simultaneous data not loss ?
>
> How do you avoid the write data between btrfs_start_delalloc_snapshot
> and create_pending_snapshot during this time ?
Oh, I forgot that buffered write doesn't rely on transaction, thus it
has the race window between above 2 functions.
So will_be_snapshotted is to ensure buffered write at above race window
always reserve space for COW.
So your patch changes the behavior from:
/------------------------ inc(will_be_snapshotted)
|
-------------------|-------------------------------------
reserve: NOCOW | reserve: COW
delalloc: NOCOW | delalloc: COW
That reserve NOCOW->delalloc COW is causing ENOSPC.
To new the behavior:
/------------------------ inc(will_be_snapshotted)
| /-- inc(snapshot_force_cow)
-------------------|---------------------|---------------
reserve: NOCOW | reserve: COW | reserve: COW
delalloc: NOCOW | delalloc: NOCOW | delalloc: COW
And now reserve->delalloc is either NOCOW->NOCOW or COW->COW, avoiding
the possible ENOSPC.
Then it makes sense now. (although way more complex)
Although I still have some questions about all these
will_be_snapshotted/snapshot_force_cow code.
1) Does that btrfs_start_delalloc_snapshot() in create_snapshot() call
still make sense?
We now have btrfs_start_delalloc_snapshot() in
btrfs_start_delalloc_flush().
Can we remove that call in create_snapshot() and:
2) Can't we make the race window smaller? (If we can remove the call in
mentioned in above quiestion)
E.g. move the inc(will_be_snapshotted) before
btrfs_start_delalloc_snapshot() in btrfs_start_delalloc_flush(),
and move inc(snapshot_force_cow) after that
btrfs_start_delalloc_snapshot() call?
As you also mentioned, the real race window is between
btrfs_start_delalloc_snapshot() and create_pending_snapshot().
Thanks,
Qu
>
> E.g. chattr +C /mnt
> Fallocate -l 4096 /mnt/small_file
> Fallocate -l $((64*1024*1024)) /mnt/large_file
>
> First process:
> While true; do
> dd if=/dev/urandom of=/mnt/small_file bs=4k count=1 conv=notrunc
> Done
>
> Second process:
> While true; do
> dd if=/dev/urandom of=/mnt/large_file bs=64k count=1024 conv=notrunc
> Done
>
> Third process: create snapshot.