If we have an inode (file) with a link count greater than 1, remove
one of its hard links and, fsync the inode, power fail/crash and
then replay the fsync log on the next mount, we end up getting the
parent directory's metadata inconsistent - its i_size still reflects
the deleted hard link. This prevents the directory from ever being
deletable, as its i_size can never decrease to BTRFS_EMPTY_DIR_SIZE
even if all of its children inodes are deleted.
This is easy to reproduce with the following excerpt from a test case
for xfstests that I just made (and it passes with xfs and ext4):
mkdir $SCRATCH_MNT/testdir
echo "hello world" > $SCRATCH_MNT/testdir/foo
ln $SCRATCH_MNT/testdir/foo $SCRATCH_MNT/testdir/bar
# Make sure all metadata and data are durably persisted.
sync
# Now remove one of the hard links and fsync the inode.
rm -f $SCRATCH_MNT/testdir/bar
$XFS_IO_PROG -c "fsync" $SCRATCH_MNT/testdir/foo
# Simulate a crash/power loss. This makes sure the next mount
# will see an fsync log and will replay that log.
_load_flakey_table $FLAKEY_DROP_WRITES
_unmount_flakey
_load_flakey_table $FLAKEY_ALLOW_WRITES
_mount_flakey
# Remove the last hard link of the file and attempt to remove its parent
# directory - this failed in btrfs because the fsync log and replay code
# didn't decrement the parent directory's i_size - this made the btrfs
# rmdir implementation always fail with -ENOTEMPTY.
#
# The parent directory's metadata inconsistency was also detected by btrfs'
# fsck tool, which is run automatically by the fstests framework when the
# test finishes.
rm -f $SCRATCH_MNT/testdir/foo
rmdir $SCRATCH_MNT/testdir
To fix this just make sure that on unlink, if the inode's link count is
greater than 1 and its parent inode is not yet in the fsync log, we end
up logging the parent inode.
Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
---
fs/btrfs/tree-log.c | 20 ++++++++++++++++++--
1 file changed, 18 insertions(+), 2 deletions(-)
diff --git a/fs/btrfs/tree-log.c b/fs/btrfs/tree-log.c
index 9a02da1..1d65a46 100644
--- a/fs/btrfs/tree-log.c
+++ b/fs/btrfs/tree-log.c
@@ -4272,6 +4272,9 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
struct dentry *old_parent = NULL;
int ret = 0;
u64 last_committed = root->fs_info->last_trans_committed;
+ const struct dentry * const first_parent = parent;
+ const bool did_unlink = (BTRFS_I(inode)->last_unlink_trans >
+ last_committed);
sb = inode->i_sb;
@@ -4327,7 +4330,6 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
goto end_trans;
}
- inode_only = LOG_INODE_EXISTS;
while (1) {
if (!parent || !parent->d_inode || sb != parent->d_inode->i_sb)
break;
@@ -4336,8 +4338,22 @@ static int btrfs_log_inode_parent(struct btrfs_trans_handle *trans,
if (root != BTRFS_I(inode)->root)
break;
+ /*
+ * On unlink we must make sure our immediate parent directory
+ * inode is fully logged. This is to prevent leaving dangling
+ * directory index entries and a wrong directory inode's i_size.
+ * Not doing so can result in a directory being impossible to
+ * delete after log replay (rmdir will always fail with error
+ * -ENOTEMPTY).
+ */
+ if (did_unlink && parent == first_parent)
+ inode_only = LOG_INODE_ALL;
+ else
+ inode_only = LOG_INODE_EXISTS;
+
if (BTRFS_I(inode)->generation >
- root->fs_info->last_trans_committed) {
+ root->fs_info->last_trans_committed ||
+ inode_only == LOG_INODE_ALL) {
ret = btrfs_log_inode(trans, root, inode, inode_only,
0, LLONG_MAX, ctx);
if (ret)
--
2.1.3
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html