On 15.05.19 г. 18:02 ч., fdmanana@xxxxxxxxxx wrote:
> From: Filipe Manana <fdmanana@xxxxxxxx>
>
> While logging an inode we follow its ancestors and for each one we mark
> it as logged in the current transaction, even if we have not logged it.
> As a consequence if we change an attribute of an ancestor, such as the
> UID or GID for example, and then explicitly fsync it, we end up not
> logging the inode at all despite returning success to user space, which
> results in the attribute being lost if a power failure happens after
> the fsync.
>
> Sample reproducer:
>
> $ mkfs.btrfs -f /dev/sdb
> $ mount /dev/sdb /mnt
>
> $ mkdir /mnt/dir
> $ chown 6007:6007 /mnt/dir
>
> $ sync
>
> $ chown 9003:9003 /mnt/dir
> $ touch /mnt/dir/file
> $ xfs_io -c fsync /mnt/dir/file
>
> # fsync our directory after fsync'ing the new file, should persist the
> # new values for the uid and gid.
> $ xfs_io -c fsync /mnt/dir
>
> <power failure>
>
> $ mount /dev/sdb /mnt
> $ stat -c %u:%g /mnt/dir
> 6007:6007
>
> --> should be 9003:9003, the uid and gid were not persisted, despite
> the explicit fsync on the directory prior to the power failure
>
> Fix this by not updating the logged_trans field of ancestor inodes when
> logging an inode, since we have not logged them. Let only future calls to
> btrfs_log_inode() to mark inodes as logged.
>
> This could be triggered by my recent fsync fuzz tester for fstests, for
> which an fstests patch exists titled "fstests: generic, fsync fuzz tester
> with fsstress".
>
> Fixes: 12fcfd22fe5b ("Btrfs: tree logging unlink/rename fixes")
> Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> ---
> fs/btrfs/tree-log.c | 11 -----------
> 1 file changed, 11 deletions(-)
>
> diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
> index 87e3e4e37606..7d13533a9620 100644
> --- a/fs/btrfs/tree-log.c
> +++ b/fs/btrfs/tree-log.c
> @@ -5465,7 +5465,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans,
> {
> int ret = 0;
> struct dentry *old_parent = NULL;
> - struct btrfs_inode *orig_inode = inode;
>
> /*
> * for regular files, if its inode is already on disk, we don't
> @@ -5485,16 +5484,6 @@ static noinline int check_parent_dirs_for_sync(struct btrfs_trans_handle *trans,
> }
>
> while (1) {
> - /*
> - * If we are logging a directory then we start with our inode,
> - * not our parent's inode, so we need to skip setting the
> - * logged_trans so that further down in the log code we don't
> - * think this inode has already been logged.
> - */
> - if (inode != orig_inode)
> - inode->logged_trans = trans->transid;
> - smp_mb();
> -
By removing this memory barrier don't you also obsolete the one in
btrfs_record_unlink_dir? Both of these were introduced in 12fcfd22fe5b
("Btrfs: tree logging unlink/rename fixes") and despite they are missing
explicit comments about the expected pairing one can only assume they
both pair against each other.
> if (btrfs_must_commit_transaction(trans, inode)) {
> ret = 1;
> break;
>