Re: [PATCH] Btrfs: fix rare chances for data loss when doing a fast fsync

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Nov 12, 2018 at 10:23:58AM +0000, fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
> 
> After the simplification of the fast fsync patch done recently by commit
> b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") and
> commit e7175a692765 ("btrfs: remove the wait ordered logic in the
> log_one_extent path"), we got a very short time window where we can get
> extents logged without writeback completing first or extents logged
> without logging the respective data checksums. Both issues can only happen
> when doing a non-full (fast) fsync.
> 
> As soon as we enter btrfs_sync_file() we trigger writeback, then lock the
> inode and then wait for the writeback to complete before starting to log
> the inode. However before we acquire the inode's lock and after we started
> writeback, it's possible that more writes happened and dirtied more pages.
> If that happened and those pages get writeback triggered while we are
> logging the inode (for example, the VM subsystem triggering it due to
> memory pressure, or another concurrent fsync), we end up seeing the
> respective extent maps in the inode's list of modified extents and will
> log matching file extent items without waiting for the respective
> ordered extents to complete, meaning that either of the following will
> happen:
> 
> 1) We log an extent after its writeback finishes but before its checksums
>    are added to the csum tree, leading to -EIO errors when attempting to
>    read the extent after a log replay.
> 
> 2) We log an extent before its writeback finishes.
>    Therefore after the log replay we will have a file extent item pointing
>    to an unwritten extent (and without the respective data checksums as
>    well).
> 
> This could not happen before the fast fsync patch simplification, because
> for any extent we found in the list of modified extents, we would wait for
> its respective ordered extent to finish writeback or collect its checksums
> for logging if it did not complete yet.
> 
> Fix this by triggering writeback again after acquiring the inode's lock
> and before waiting for ordered extents to complete.
> 
> Fixes: e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path")
> Fixes: b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time")
> CC: stable@xxxxxxxxxxxxxxx # 4.19+
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>

Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>

Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux