On Tue, Nov 19, 2019 at 12:07:33PM +0000, fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> When using the NO_HOLES feature, if we punch a hole into a file and then
> fsync it, there are cases where a subsequent fsync will miss the fact that
> a hole was punched, resulting in the holes not existing after replaying
> the log tree.
>
> Essentially these cases all imply that, tree-log.c:copy_items(), is not
> invoked for the leafs that delimit holes, because nothing changed those
> leafs in the current transaction. And it's precisely copy_items() where
> we currenly detect and log holes, which works as long as the holes are
> between file extent items in the input leaf or between the beginning of
> input leaf and the previous leaf or between the last item in the leaf
> and the next leaf.
>
> First example where we miss a hole:
>
> *) The extent items of the inode span multiple leafs;
>
> *) The punched hole covers a range that affects only the extent items of
> the first leaf;
>
> *) The fsync operation is done in full mode (BTRFS_INODE_NEEDS_FULL_SYNC
> is set in the inode's runtime flags).
>
> That results in the hole not existing after replaying the log tree.
>
> For example, if the fs/subvolume tree has the following layout for a
> particular inode:
>
> Leaf N, generation 10:
>
> [ ... INODE_ITEM INODE_REF EXTENT_ITEM (0 64K) EXTENT_ITEM (64K 128K) ]
>
> Leaf N + 1, generation 10:
>
> [ EXTENT_ITEM (128K 64K) ... ]
>
> If at transaction 11 we punch a hole coverting the range [0, 128K[, we end
> up dropping the two extent items from leaf N, but we don't touch the other
> leaf, so we end up in the following state:
>
> Leaf N, generation 11:
>
> [ ... INODE_ITEM INODE_REF ]
>
> Leaf N + 1, generation 10:
>
> [ EXTENT_ITEM (128K 64K) ... ]
>
> A full fsync after punching the hole will only process leaf N because it
> was modified in the current transaction, but not leaf N + 1, since it
> was not modified in the current transaction (generation 10 and not 11).
> As a result the fsync will not log any holes, because it didn't process
> any leaf with extent items.
>
> Second example where we will miss a hole:
>
> *) An inode as its items spanning 5 (or more) leafs;
>
> *) A hole is punched and it covers only the extents items of the 3rd
> leaf. This resulsts in deleting the entire leaf and not touching any
> of the other leafs.
>
> So the only leaf that is modified in the current transaction, when
> punching the hole, is the first leaf, which contains the inode item.
> During the full fsync, the only leaf that is passed to copy_items()
> is that first leaf, and that's not enough for the hole detection
> code in copy_items() to determine there's a hole between the last
> file extent item in the 2nd leaf and the first file extent item in
> the 3rd leaf (which was the 4th leaf before punching the hole).
>
> Fix this by scanning all leafs and punch holes as necessary when doing a
> full fsync (less common than a non-full fsync) when the NO_HOLES feature
> is enabled. The lack of explicit file extent items to mark holes makes it
> necessary to scan existing extents to determine if holes exist.
>
> A test case for fstests follows soon.
>
> Fixes: 16e7549f045d33 ("Btrfs: incompatible format change to remove hole extents")
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
Reviewed-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
Thanks,
Josef