We have been seeing issues in production where a cleaner script will end
up unlinking a bunch of files that have pending iputs. This means they
will get their final iput's run at btrfs-cleaner time and thus are not
throttled, which impacts the workload.
Since we are unlinking these files we can just drop the delayed iput at
unlink time. We are already holding a reference to the inode so this
will not be the final iput and thus is completely safe to do at this
point. Doing this means we are more likely to be doing the final iput
at unlink time, and thus will get the IO charged to the caller and get
throttled appropriately without affecting the main workload.
Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
---
fs/btrfs/inode.c | 22 ++++++++++++++++++++++
1 file changed, 22 insertions(+)
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
index b6d549c993f6..e58685b5d398 100644
--- a/fs/btrfs/inode.c
+++ b/fs/btrfs/inode.c
@@ -4009,6 +4009,28 @@ static int __btrfs_unlink_inode(struct btrfs_trans_handle *trans,
ret = 0;
else if (ret)
btrfs_abort_transaction(trans, ret);
+
+ /*
+ * If we have a pending delayed iput we could end up with the final iput
+ * being run in btrfs-cleaner context. If we have enough of these built
+ * up we can end up burning a lot of time in btrfs-cleaner without any
+ * way to throttle the unlinks. Since we're currently holding a ref on
+ * the inode we can run the delayed iput here without any issues as the
+ * final iput won't be done until after we drop the ref we're currently
+ * holding.
+ */
+ if (!list_empty(&inode->delayed_iput)) {
+ spin_lock(&fs_info->delayed_iput_lock);
+ if (!list_empty(&inode->delayed_iput)) {
+ list_del_init(&inode->delayed_iput);
+ spin_unlock(&fs_info->delayed_iput_lock);
+ iput(&inode->vfs_inode);
+ if (atomic_dec_and_test(&fs_info->nr_delayed_iputs))
+ wake_up(&fs_info->delayed_iputs_wait);
+ } else {
+ spin_unlock(&fs_info->delayed_iput_lock);
+ }
+ }
err:
btrfs_free_path(path);
if (ret)
--
2.13.5