On Tue, Apr 21, 2020 at 11:25:20AM +0100, fdmanana@xxxxxxxxxx wrote: > From: Filipe Manana <fdmanana@xxxxxxxx> > > When we have an inode with a prealloc extent that starts at an offset > lower than the i_size and there is another prealloc extent that starts at > an offset beyond i_size, we can end up losing part of the first prealloc > extent (the part that starts at i_size) and have an implicit hole if we > fsync the file and then have a power failure. > > Consider the following example with comments explaining how and why it > happens. > > $ mkfs.btrfs -f /dev/sdb > $ mount /dev/sdb /mnt > > # Create our test file with 2 consecutive prealloc extents, each with a > # size of 128Kb, and covering the range from 0 to 256Kb, with a file > # size of 0. > $ xfs_io -f -c "falloc -k 0 128K" /mnt/foo > $ xfs_io -c "falloc -k 128K 128K" /mnt/foo > > # Fsync the file to record both extents in the log tree. > $ xfs_io -c "fsync" /mnt/foo > > # Now do a redudant extent allocation for the range from 0 to 64Kb. > # This will merely increase the file size from 0 to 64Kb. Instead we > # could also do a truncate to set the file size to 64Kb. > $ xfs_io -c "falloc 0 64K" /mnt/foo > > # Fsync the file, so we update the inode item in the log tree with the > # new file size (64Kb). This also ends up setting the number of bytes > # for the first prealloc extent to 64Kb. This is done by the truncation > # at btrfs_log_prealloc_extents(). > # This means that if a power failure happens after this, a write into > # the file range 64Kb to 128Kb will not use the prealloc extent and > # will result in allocation of a new extent. > $ xfs_io -c "fsync" /mnt/foo > > # Now set the file size to 256K with a truncate and then fsync the file. > # Since no changes happened to the extents, the fsync only updates the > # i_size in the inode item at the log tree. This results in an implicit > # hole for the file range from 64Kb to 128Kb, something which fsck will > # complain when not using the NO_HOLES feature if we replay the log > # after a power failure. > $ xfs_io -c "truncate 256K" -c "fsync" /mnt/foo > > So instead of always truncating the log to the inode's current i_size at > btrfs_log_prealloc_extents(), check first if there's a prealloc extent > that starts at an offset lower than the i_size and with a length that > crosses the i_size - if there is one, just make sure we truncate to a > size that corresponds to the end offset of that prealloc extent, so > that we don't lose the part of that extent that starts at i_size if a > power failure happens. > > A test case for fstests follows soon. > > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx> Added to misc-next, thanks.
