On Mon, Nov 12, 2018 at 10:23:58AM +0000, fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> After the simplification of the fast fsync patch done recently by commit
> b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time") and
> commit e7175a692765 ("btrfs: remove the wait ordered logic in the
> log_one_extent path"), we got a very short time window where we can get
> extents logged without writeback completing first or extents logged
> without logging the respective data checksums. Both issues can only happen
> when doing a non-full (fast) fsync.
>
> As soon as we enter btrfs_sync_file() we trigger writeback, then lock the
> inode and then wait for the writeback to complete before starting to log
> the inode. However before we acquire the inode's lock and after we started
> writeback, it's possible that more writes happened and dirtied more pages.
> If that happened and those pages get writeback triggered while we are
> logging the inode (for example, the VM subsystem triggering it due to
> memory pressure, or another concurrent fsync), we end up seeing the
> respective extent maps in the inode's list of modified extents and will
> log matching file extent items without waiting for the respective
> ordered extents to complete, meaning that either of the following will
> happen:
>
> 1) We log an extent after its writeback finishes but before its checksums
> are added to the csum tree, leading to -EIO errors when attempting to
> read the extent after a log replay.
>
> 2) We log an extent before its writeback finishes.
> Therefore after the log replay we will have a file extent item pointing
> to an unwritten extent (and without the respective data checksums as
> well).
>
> This could not happen before the fast fsync patch simplification, because
> for any extent we found in the list of modified extents, we would wait for
> its respective ordered extent to finish writeback or collect its checksums
> for logging if it did not complete yet.
>
> Fix this by triggering writeback again after acquiring the inode's lock
> and before waiting for ordered extents to complete.
>
> Fixes: e7175a692765 ("btrfs: remove the wait ordered logic in the log_one_extent path")
> Fixes: b5e6c3e170b7 ("btrfs: always wait on ordered extents at fsync time")
> CC: stable@xxxxxxxxxxxxxxx # 4.19+
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
Thanks,
Josef