On Fri, Apr 26, 2019 at 7:59 AM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > On 2019/4/26 下午2:32, Qu Wenruo wrote: > > > > > > On 2019/4/26 上午11:32, robbieko wrote: > > [snipped] > >>> > >>> But now, before creating new snapshot, we will flush all data. > >>> > >>> This means at this time, the data will be written back *NOCOW*, just as > >>> expected. > >>> > >>>> will_be_snapshotted will be set, and then flush > >>> > >>> will_be_snapshotted will also be gone. > >>> > >>> Thanks, > >>> Qu > >>> > >> > >> Why will will_be_snapshotted disappear? > >> > >> If will_be_snapshotted is removed, when volume is full, > >> how do we ensure that snapshot and write are both > >> simultaneous data not loss ? > >> > >> How do you avoid the write data between btrfs_start_delalloc_snapshot > >> and create_pending_snapshot during this time ? > > > > Oh, I forgot that buffered write doesn't rely on transaction, thus it > > has the race window between above 2 functions. > > > > > > So will_be_snapshotted is to ensure buffered write at above race window > > always reserve space for COW. > > > > So your patch changes the behavior from: > > /------------------------ inc(will_be_snapshotted) > > | > > -------------------|------------------------------------- > > reserve: NOCOW | reserve: COW > > delalloc: NOCOW | delalloc: COW > > > > That reserve NOCOW->delalloc COW is causing ENOSPC. > > > > To new the behavior: > > > > /------------------------ inc(will_be_snapshotted) > > | /-- inc(snapshot_force_cow) > > -------------------|---------------------|--------------- > > reserve: NOCOW | reserve: COW | reserve: COW > > delalloc: NOCOW | delalloc: NOCOW | delalloc: COW > > > > And now reserve->delalloc is either NOCOW->NOCOW or COW->COW, avoiding > > the possible ENOSPC. > > > > Then it makes sense now. (although way more complex) > > > > Although I still have some questions about all these > > will_be_snapshotted/snapshot_force_cow code. > > > > 1) Does that btrfs_start_delalloc_snapshot() in create_snapshot() call > > still make sense? > > We now have btrfs_start_delalloc_snapshot() in > > btrfs_start_delalloc_flush(). > > Can we remove that call in create_snapshot() and: > > > > 2) Can't we make the race window smaller? (If we can remove the call in > > mentioned in above quiestion) > > E.g. move the inc(will_be_snapshotted) before > > btrfs_start_delalloc_snapshot() in btrfs_start_delalloc_flush(), > > and move inc(snapshot_force_cow) after that > > btrfs_start_delalloc_snapshot() call? > > > > As you also mentioned, the real race window is between > > btrfs_start_delalloc_snapshot() and create_pending_snapshot(). > > And another (unrelated) question: > > 3) Why we use such complex way to prevent race? Why not just go per- > subvolume freeze like various volume manager? > > Current way keeps buffered write still going, but does it worth? And that's precisely one of the advantages of Btrfs' snapshotting - it does not block writes while snapshotting is in progress. Blocking them is terrible for user experience - waiting for snapshotting IO (flushing delalloc, waiting for delalloc, committing a transaction, etc) can take seconds, or minutes, or maybe more in extreme cases. As a user, while I'm browsing youtube, replying to these threads, playing games, etc, I would really hate if I experience stalls while snapper decided to take a snapshot of my root or home subvolumes. Not to mention the case for IO intensive applications (databases, etc). > > If we freeze/block write when creating snapshot, it just falls back > to traditional volume manager behavior, shouldn't cause much problem, > and we can get rid of all these snapshot specific fixes. So if after that we still find bugs in snapshotting, and we will for sure, then we do what? Disable snapshotting in btrfs, remove all the supporting code? NACK. > > Thanks, > Qu > > > > > Thanks, > > Qu > > > >> > >> E.g. chattr +C /mnt > >> Fallocate -l 4096 /mnt/small_file > >> Fallocate -l $((64*1024*1024)) /mnt/large_file > >> > >> First process: > >> While true; do > >> dd if=/dev/urandom of=/mnt/small_file bs=4k count=1 conv=notrunc > >> Done > >> > >> Second process: > >> While true; do > >> dd if=/dev/urandom of=/mnt/large_file bs=64k count=1024 conv=notrunc > >> Done > >> > >> Third process: create snapshot. > > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”
