Re: [PATCH] btrfs: run delayed iput at unlink time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 8 May 2019, at 3:15, Nikolay Borisov wrote:

> On 7.05.19 г. 20:27 ч., Josef Bacik wrote:
>> We have been seeing issues in production where a cleaner script will 
>> end
>> up unlinking a bunch of files that have pending iputs.  This means 
>> they
>> will get their final iput's run at btrfs-cleaner time and thus are 
>> not
>> throttled, which impacts the workload.
>>
>> Since we are unlinking these files we can just drop the delayed iput 
>> at
>> unlink time.  We are already holding a reference to the inode so this
>> will not be the final iput and thus is completely safe to do at this
>> point.  Doing this means we are more likely to be doing the final 
>> iput
>> at unlink time, and thus will get the IO charged to the caller and 
>> get
>> throttled appropriately without affecting the main workload.
>>
>> Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx>
>> ---
>>  fs/btrfs/inode.c | 22 ++++++++++++++++++++++
>>  1 file changed, 22 insertions(+)
>>
>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c
>> index b6d549c993f6..e58685b5d398 100644
>> --- a/fs/btrfs/inode.c
>> +++ b/fs/btrfs/inode.c
>> @@ -4009,6 +4009,28 @@ static int __btrfs_unlink_inode(struct 
>> btrfs_trans_handle *trans,
>>  		ret = 0;
>>  	else if (ret)
>>  		btrfs_abort_transaction(trans, ret);
>> +
>> +	/*
>> +	 * If we have a pending delayed iput we could end up with the final 
>> iput
>> +	 * being run in btrfs-cleaner context.  If we have enough of these 
>> built
>> +	 * up we can end up burning a lot of time in btrfs-cleaner without 
>> any
>> +	 * way to throttle the unlinks.  Since we're currently holding a 
>> ref on
>> +	 * the inode we can run the delayed iput here without any issues as 
>> the
>> +	 * final iput won't be done until after we drop the ref we're 
>> currently
>> +	 * holding.
>> +	 */
>
> FWIW the caller is not really holding an explicit reference, rather
> there is a reference held by the dentry which is going to be disposed 
> of
> by vfs. Considering this I'd say this is a false claim. I.e "we" do 
> not
> hold a reference.

It's impossible to call this function without a reference held on the 
inode, kind of nit-picking on "we" vs "the vfs".

>
>> +	if (!list_empty(&inode->delayed_iput)) {
>> +		spin_lock(&fs_info->delayed_iput_lock);
>> +		if (!list_empty(&inode->delayed_iput)) {
>> +			list_del_init(&inode->delayed_iput);
>> +			spin_unlock(&fs_info->delayed_iput_lock);
>> +			iput(&inode->vfs_inode);
>> +			if (atomic_dec_and_test(&fs_info->nr_delayed_iputs))
>> +				wake_up(&fs_info->delayed_iputs_wait);
>> +		} else {
>> +			spin_unlock(&fs_info->delayed_iput_lock);
>> +		}
>> +	}
>
> OTOH this really feels like a hack and this stems from the fact that
> iput is rather rudimentary. Additionally you are essentially 
> opencoding
> the body of btrfs_run_delayed_iputs. I was going to suggest to 
> introduce
> a new helper factoring out the common code but it will get ugly due to
> the spin lock being dropped before doing the iput.
>
> But then I'm really starting to question the utility of delayed iputs.
> Presumably it was added to defer the expensive final iput in the 
> cleaner
> context or avoid some deadlocks (but we don't know which exactly). 
> Yet,
> here we are some time later where you are essentially saying "this
> mechanism is suboptimal because it's dumb and instead of improving
> things it's making them worse in certain cases, so let's unload it a 
> bit
> by doing an iput here".

The final iput is pretty expensive, since it potentially does the full 
truncate of an arbitrary sized file.  There are a lot of contexts it 
can't be called from, so the delayed iput code saves us from some 
impossible situations.  It originally came here:

commit 24bbcf0442ee04660a5a030efdbb6d03f1c275cb
Author: Yan, Zheng <zheng.yan@xxxxxxxxxx>
Date:   Thu Nov 12 09:36:34 2009 +0000

     Btrfs: Add delayed iput

But we've expanded usage to solve a few different deadlocks.

-chris




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux