On Wed, Jan 15, 2020 at 01:21:35PM +0000, fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> Recently fsstress (from fstests) sporadically started to trigger an
> infinite loop during fsync operations. This turned out to be because
> support for the rename exchange and whiteout operations was added to
> fsstress in fstests. These operations, unlike any others in fsstress,
> cause file names to be reused, whence triggering this issue. However
> it's not necessary to use rename exchange and rename whiteout operations
> trigger this issue, simple rename operations and file creations are
> enough to trigger the issue.
>
> The issue boils down to when we are logging inodes that conflict (that
> had the name of any inode we need to log during the fsync operation),
> we keep logging them even if they were already logged before, and after
> that we check if there's any other inode that conflicts with them and
> then add it again to the list of inodes to log. Skipping already logged
> inodes fixes the issue.
>
> Consider the following example:
>
> $ mkfs.btrfs -f /dev/sdb
> $ mount /dev/sdb /mnt
>
> $ mkdir /mnt/testdir # inode 257
>
> $ touch /mnt/testdir/zz # inode 258
> $ ln /mnt/testdir/zz /mnt/testdir/zz_link
>
> $ touch /mnt/testdir/a # inode 259
>
> $ sync
>
> # The following 3 renames achieve the same result as a rename exchange
> # operation (<rename_exchange> /mnt/testdir/zz_link to /mnt/testdir/a).
>
> $ mv /mnt/testdir/a /mnt/testdir/a/tmp
> $ mv /mnt/testdir/zz_link /mnt/testdir/a
> $ mv /mnt/testdir/a/tmp /mnt/testdir/zz_link
>
> # The following rename and file creation give the same result as a
> # rename whiteout operation (<rename_whiteout> zz to a2).
>
> $ mv /mnt/testdir/zz /mnt/testdir/a2
> $ touch /mnt/testdir/zz # inode 260
>
> $ xfs_io -c fsync /mnt/testdir/zz
> --> results in the infinite loop
>
> The following steps happen:
>
> 1) When logging inode 260, we find that its reference named "zz" was
> used by inode 258 in the previous transaction (through the commit
> root), so inode 258 is added to the list of conflicting indoes that
> need to be logged;
>
> 2) After logging inode 258, we find that its reference named "a" was
> used by inode 259 in the previous transaction, and therefore we add
> inode 259 to the list of conflicting inodes to be logged;
>
> 3) After logging inode 259, we find that its reference named "zz_link"
> was used by inode 258 in the previous transaction - we add inode 258
> to the list of conflicting inodes to log, again - we had already
> logged it before at step 3. After logging it again, we find again
> that inode 259 conflicts with him, and we add again 259 to the list,
> etc - we end up repeating all the previous steps.
>
> So fix this by skipping logging of conflicting inodes that were already
> logged.
>
> Fixes: 6b5fc433a7ad67 ("Btrfs: fix fsync after succession of renames of different files")
> CC: stable@xxxxxxxxxxxxxxx
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
Added to misc-next, thanks.