Re: [PATCH] Btrfs: fix unexpected failure of nocow buffered writes after snapshotting when low on space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/4/26 上午11:32, robbieko wrote:
[snipped]
>>
>> But now, before creating new snapshot, we will flush all data.
>>
>> This means at this time, the data will be written back *NOCOW*, just as
>> expected.
>>
>>> will_be_snapshotted will be set, and then flush
>>
>> will_be_snapshotted will also be gone.
>>
>> Thanks,
>> Qu
>>
>
> Why will will_be_snapshotted disappear?
>
> If will_be_snapshotted is removed, when volume is full,
> how do we ensure that snapshot and write are both
> simultaneous data not loss ?
>
> How do you avoid the write data between btrfs_start_delalloc_snapshot
> and create_pending_snapshot during this time ?

Oh, I forgot that buffered write doesn't rely on transaction, thus it
has the race window between above 2 functions.


So will_be_snapshotted is to ensure buffered write at above race window
always reserve space for COW.

So your patch changes the behavior from:
                   /------------------------ inc(will_be_snapshotted)
                   |
-------------------|-------------------------------------
reserve: NOCOW     | reserve: COW
delalloc: NOCOW    | delalloc: COW

That reserve NOCOW->delalloc COW is causing ENOSPC.

To new the behavior:

                   /------------------------ inc(will_be_snapshotted)
                   |                     /-- inc(snapshot_force_cow)
-------------------|---------------------|---------------
reserve: NOCOW     | reserve: COW        | reserve: COW
delalloc: NOCOW    | delalloc: NOCOW     | delalloc: COW

And now reserve->delalloc is either NOCOW->NOCOW or COW->COW, avoiding
the possible ENOSPC.

Then it makes sense now. (although way more complex)

Although I still have some questions about all these
will_be_snapshotted/snapshot_force_cow code.

1) Does that btrfs_start_delalloc_snapshot() in create_snapshot() call
   still make sense?
   We now have btrfs_start_delalloc_snapshot() in
   btrfs_start_delalloc_flush().
   Can we remove that call in create_snapshot() and:

2) Can't we make the race window smaller? (If we can remove the call in
   mentioned in above quiestion)
   E.g. move the inc(will_be_snapshotted) before
   btrfs_start_delalloc_snapshot() in btrfs_start_delalloc_flush(),
   and move inc(snapshot_force_cow) after that
   btrfs_start_delalloc_snapshot() call?

   As you also mentioned, the real race window is between
   btrfs_start_delalloc_snapshot() and create_pending_snapshot().

Thanks,
Qu

>
> E.g. chattr +C /mnt
> Fallocate -l 4096 /mnt/small_file
> Fallocate -l $((64*1024*1024)) /mnt/large_file
>
> First process:
> While true; do
> dd if=/dev/urandom of=/mnt/small_file bs=4k count=1 conv=notrunc
> Done
>
> Second process:
> While true; do
> dd if=/dev/urandom of=/mnt/large_file bs=64k count=1024 conv=notrunc
> Done
>
> Third process: create snapshot.





[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux