Re: df shows no available space in 5.4.1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 2019/12/11 下午9:11, Martin Raiber wrote:
> On 10.12.2019 02:19 Qu Wenruo wrote:
>>
>> On 2019/12/10 上午8:52, Qu Wenruo wrote:
>>>
>>> On 2019/12/10 上午2:56, Martin Raiber wrote:
>>>> On 07.12.2019 08:28 Qu Wenruo wrote:
>>>>> On 2019/12/7 上午5:26, Martin Raiber wrote:
>>>>>> Hi,
>>>>>>
>>>>>> with kernel 5.4.1 I have the problem that df shows 100% space used. I
>>>>>> can still write to the btrfs volume, but my software looks at the
>>>>>> available space and starts deleting stuff if statfs() says there is a
>>>>>> low amount of available space.
>>>>> If the bug still happens, mind to try the snippet to see why this happened?
>>>>>
>>>>> You will need to:
>>>>> - Apply the patch to your kernel code
>>>>> - Recompile the kernel or btrfs module
>>>>>   So this needs some experience in kernel compile.
>>>>> - Reboot to newly compiled kernel or load the debug btrfs module
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>>>> index 23aa630f04c9..cf34c05b16d7 100644
>>>>> --- a/fs/btrfs/relocation.c
>>>>> +++ b/fs/btrfs/relocation.c
>>>>> @@ -523,7 +523,8 @@ static int should_ignore_root(struct btrfs_root *root)
>>>>>  {
>>>>>         struct btrfs_root *reloc_root;
>>>>>
>>>>> -       if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>>>>> +       if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state) ||
>>>>> +           test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>>>>>                 return 0;
>>>>>
>>>>>         reloc_root = root->reloc_root;
>>>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>>>> index f452a94abdc3..c2b70d97a63b 100644
>>>>> --- a/fs/btrfs/super.c
>>>>> +++ b/fs/btrfs/super.c
>>>>> @@ -2064,6 +2064,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>>>> struct kstatfs *buf)
>>>>>                                         found->disk_used;
>>>>>                 }
>>>>>
>>>>> +               pr_info("%s: found type=0x%llx disk_used=%llu factor=%d\n",
>>>>> +                       __func__, found->flags, found->disk_used, factor);
>>>>>                 total_used += found->disk_used;
>>>>>         }
>>>>>
>>>>> @@ -2071,6 +2073,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>>>> struct kstatfs *buf)
>>>>>
>>>>>         buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super),
>>>>> factor);
>>>>>         buf->f_blocks >>= bits;
>>>>> +       pr_info("%s: super_total_bytes=%llu total_used=%llu
>>>>> factor=%d\n", __func__,
>>>>> +               btrfs_super_total_bytes(disk_super), total_used, factor);
>>>>>         buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >>
>>>>> bits);
>>>>>
>>>>>         /* Account global block reserve as used, it's in logical size
>>>>> already */
>>>>>
>>>> Applied. It's currently 100% used directly after reboot, and I am
>>>> getting this log output:
>>> Thank you a lot for the debug output!
>>>
>>>> [...]
>>>> [  241.245150] btrfs_statfs: super_total_bytes=128835387392
>>>> total_used=93778841600 factor=1
>>>> [  241.904824] btrfs_statfs: found type=0x1 disk_used=93464006656 factor=1
>>>> [  241.904824] btrfs_statfs: found type=0x4 disk_used=314818560 factor=1
>>>> [  241.904824] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
>>>> [  241.904824] btrfs_statfs: super_total_bytes=128835387392
>>>> total_used=93778841600 factor=1
>>> This proves the on-disk numbers are all correct, so far so good.
>>>
>>> The remaining problem is the block_rsv part. Which matches with the new
>>> ticket system introduced in v5.4.
>>>
>>> Mind to test the new debug snippet?
>>>
>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>> index f452a94abdc3..516969534095 100644
>>> --- a/fs/btrfs/super.c
>>> +++ b/fs/btrfs/super.c
>>> @@ -2076,6 +2076,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>> struct kstatfs *buf)
>>>         /* Account global block reserve as used, it's in logical size
>>> already */
>>>         spin_lock(&block_rsv->lock);
>>>         /* Mixed block groups accounting is not byte-accurate, avoid
>>> overflow */
>>> +       pr_info("%s: block_rsv->size=%llu block_rsv->reserved=%llu\n",
>>> __func__,
>>> +               block_rsv->size, block_rsv->reserved);
>>>         if (buf->f_bfree >= block_rsv->size >> bits)
>>>                 buf->f_bfree -= block_rsv->size >> bits;
>>>         else
>>>
>> And this extra snippet for available space.
>>
>> Thanks,
>> Qu
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index f452a94abdc3..f1a3e01a0ef5 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1911,6 +1911,7 @@ static inline int
>> btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
>>          * We aren't under the device list lock, so this is racy-ish,
>> but good
>>          * enough for our purposes.
>>          */
>> +       pr_info("%s: original_free_bytes=%llu\n", __func__, *free_bytes);
>>         nr_devices = fs_info->fs_devices->open_devices;
>>         if (!nr_devices) {
>>                 smp_mb();
>> @@ -2005,6 +2006,7 @@ static inline int
>> btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
>>
>>         kfree(devices_info);
>>         *free_bytes = avail_space;
>> +       pr_info("%s: calculated_bytes=%llu\n", __func__, avail_space);
>>         return 0;
>>  }
>>

Sorry for the date reply, was busy firefighting some bugs.

> Now logs this at 100% used:
> 
> [90273.353449] btrfs_calc_avail_data_space: original_free_bytes=23583420416
> [90273.353449] btrfs_calc_avail_data_space: calculated_bytes=13662945280

This marks the beginning of one statefs call.

> [90273.369508] btrfs_statfs: found type=0x1 disk_used=90233212928 factor=1
> [90273.369536] btrfs_statfs: found type=0x1 disk_used=90233212928 factor=1
> [90273.369536] btrfs_statfs: found type=0x4 disk_used=339361792 factor=1
> [90273.369508] btrfs_statfs: found type=0x4 disk_used=339361792 factor=1
> [90273.369508] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
> [90273.369536] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
> [90273.369508] btrfs_statfs: super_total_bytes=128835387392
> total_used=90572591104 factor=1

So far so good. All SINGLE chunks, total disk bytes are ~120GiB.
While totally used bytes are ~84GiB.

In theory, we should give ~36GiB.

> [90273.369508] btrfs_statfs: block_rsv->size=147554304
> block_rsv->reserved=147554304

block_rsv is tiny, just ~140 MiB, shouldn't cause much difference.

> [90273.369537] btrfs_statfs: super_total_bytes=128835387392
> total_used=90572591104 factor=1

So at this stage, f_bfree should be 74732024 - 288192 blocks.
                                    ^^^^^^^^   ^^^- block_rsv / 512
                                    |- (total_bytes - total_used ) / 512

At least, f_bfree looks OK.

> [90273.369509] btrfs_calc_avail_data_space: original_free_bytes=23583420416
> [90273.369537] btrfs_statfs: block_rsv->size=147554304
> block_rsv->reserved=147554304
> [90273.369537] btrfs_calc_avail_data_space: original_free_bytes=23583420416

Still good, we have around ~21.9GiB unused data space across all
allocated data chunks.
All this ~21.9GiB should contribute to f_bavail.

Although it means you have some fragments, it's not a big deal at all.

> [90273.369509] btrfs_calc_avail_data_space: calculated_bytes=13662945280
> [90273.369537] btrfs_calc_avail_data_space: calculated_bytes=13662945280

And btrfs_calc_avail_data_space() find that we can allocate around
12.7GiB new data chunks.

This 12.7GiB also going to be part of f_bavail.

This means, you should have ~34GiB free space, before we do the
offending check:

	if (!mixed && total_free_meta - thresh < block_rsv->size)
		buf->f_bavail = 0;

This check is pretty old, from 2015, while recently we allow aggressive
metadata over-committing, thus we can have a lot of metadata reserved
space without really allocating new metadata chunks.

I'll try to find out a better calculation to co-operate with metadata
over-committing.

Feel free to remove all debugg snippets, and if you want some dirty
fixes, please try the attached diff.

Thanks,
Qu

> [90273.400227] btrfs_statfs: found type=0x1 disk_used=726834307072 factor=1
> [90273.400227] btrfs_statfs: found type=0x4 disk_used=4908548096 factor=1
> [90273.400227] btrfs_statfs: found type=0x2 disk_used=98304 factor=1
> [90273.400227] btrfs_statfs: super_total_bytes=8133881348096
> total_used=731742953472 factor=1
> [90273.400227] btrfs_statfs: block_rsv->size=536870912
> block_rsv->reserved=536821760
> [90273.400227] btrfs_calc_avail_data_space: original_free_bytes=1171038208
> [90273.400227] btrfs_calc_avail_data_space: calculated_bytes=7400493613056
> 
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
index 1b151af25772..b8b67ab05f72 100644
--- a/fs/btrfs/super.c
+++ b/fs/btrfs/super.c
@@ -2032,7 +2032,6 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	unsigned factor = 1;
 	struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv;
 	int ret;
-	u64 thresh = 0;
 	int mixed = 0;
 
 	rcu_read_lock();
@@ -2085,26 +2084,9 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf)
 	if (ret)
 		return ret;
 	buf->f_bavail += div_u64(total_free_data, factor);
+	buf->f_bavail -= block_rsv->size;
 	buf->f_bavail = buf->f_bavail >> bits;
 
-	/*
-	 * We calculate the remaining metadata space minus global reserve. If
-	 * this is (supposedly) smaller than zero, there's no space. But this
-	 * does not hold in practice, the exhausted state happens where's still
-	 * some positive delta. So we apply some guesswork and compare the
-	 * delta to a 4M threshold.  (Practically observed delta was ~2M.)
-	 *
-	 * We probably cannot calculate the exact threshold value because this
-	 * depends on the internal reservations requested by various
-	 * operations, so some operations that consume a few metadata will
-	 * succeed even if the Avail is zero. But this is better than the other
-	 * way around.
-	 */
-	thresh = SZ_4M;
-
-	if (!mixed && total_free_meta - thresh < block_rsv->size)
-		buf->f_bavail = 0;
-
 	buf->f_type = BTRFS_SUPER_MAGIC;
 	buf->f_bsize = dentry->d_sb->s_blocksize;
 	buf->f_namelen = BTRFS_NAME_LEN;

Attachment: signature.asc
Description: OpenPGP digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux