On 2019/12/11 下午9:11, Martin Raiber wrote:
> On 10.12.2019 02:19 Qu Wenruo wrote:
>>
>> On 2019/12/10 上午8:52, Qu Wenruo wrote:
>>>
>>> On 2019/12/10 上午2:56, Martin Raiber wrote:
>>>> On 07.12.2019 08:28 Qu Wenruo wrote:
>>>>> On 2019/12/7 上午5:26, Martin Raiber wrote:
>>>>>> Hi,
>>>>>>
>>>>>> with kernel 5.4.1 I have the problem that df shows 100% space used. I
>>>>>> can still write to the btrfs volume, but my software looks at the
>>>>>> available space and starts deleting stuff if statfs() says there is a
>>>>>> low amount of available space.
>>>>> If the bug still happens, mind to try the snippet to see why this happened?
>>>>>
>>>>> You will need to:
>>>>> - Apply the patch to your kernel code
>>>>> - Recompile the kernel or btrfs module
>>>>> So this needs some experience in kernel compile.
>>>>> - Reboot to newly compiled kernel or load the debug btrfs module
>>>>>
>>>>> Thanks,
>>>>> Qu
>>>>>
>>>>> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
>>>>> index 23aa630f04c9..cf34c05b16d7 100644
>>>>> --- a/fs/btrfs/relocation.c
>>>>> +++ b/fs/btrfs/relocation.c
>>>>> @@ -523,7 +523,8 @@ static int should_ignore_root(struct btrfs_root *root)
>>>>> {
>>>>> struct btrfs_root *reloc_root;
>>>>>
>>>>> - if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state))
>>>>> + if (!test_bit(BTRFS_ROOT_REF_COWS, &root->state) ||
>>>>> + test_bit(BTRFS_ROOT_DEAD_RELOC_TREE, &root->state))
>>>>> return 0;
>>>>>
>>>>> reloc_root = root->reloc_root;
>>>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>>>> index f452a94abdc3..c2b70d97a63b 100644
>>>>> --- a/fs/btrfs/super.c
>>>>> +++ b/fs/btrfs/super.c
>>>>> @@ -2064,6 +2064,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>>>> struct kstatfs *buf)
>>>>> found->disk_used;
>>>>> }
>>>>>
>>>>> + pr_info("%s: found type=0x%llx disk_used=%llu factor=%d\n",
>>>>> + __func__, found->flags, found->disk_used, factor);
>>>>> total_used += found->disk_used;
>>>>> }
>>>>>
>>>>> @@ -2071,6 +2073,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>>>> struct kstatfs *buf)
>>>>>
>>>>> buf->f_blocks = div_u64(btrfs_super_total_bytes(disk_super),
>>>>> factor);
>>>>> buf->f_blocks >>= bits;
>>>>> + pr_info("%s: super_total_bytes=%llu total_used=%llu
>>>>> factor=%d\n", __func__,
>>>>> + btrfs_super_total_bytes(disk_super), total_used, factor);
>>>>> buf->f_bfree = buf->f_blocks - (div_u64(total_used, factor) >>
>>>>> bits);
>>>>>
>>>>> /* Account global block reserve as used, it's in logical size
>>>>> already */
>>>>>
>>>> Applied. It's currently 100% used directly after reboot, and I am
>>>> getting this log output:
>>> Thank you a lot for the debug output!
>>>
>>>> [...]
>>>> [ 241.245150] btrfs_statfs: super_total_bytes=128835387392
>>>> total_used=93778841600 factor=1
>>>> [ 241.904824] btrfs_statfs: found type=0x1 disk_used=93464006656 factor=1
>>>> [ 241.904824] btrfs_statfs: found type=0x4 disk_used=314818560 factor=1
>>>> [ 241.904824] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
>>>> [ 241.904824] btrfs_statfs: super_total_bytes=128835387392
>>>> total_used=93778841600 factor=1
>>> This proves the on-disk numbers are all correct, so far so good.
>>>
>>> The remaining problem is the block_rsv part. Which matches with the new
>>> ticket system introduced in v5.4.
>>>
>>> Mind to test the new debug snippet?
>>>
>>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>>> index f452a94abdc3..516969534095 100644
>>> --- a/fs/btrfs/super.c
>>> +++ b/fs/btrfs/super.c
>>> @@ -2076,6 +2076,8 @@ static int btrfs_statfs(struct dentry *dentry,
>>> struct kstatfs *buf)
>>> /* Account global block reserve as used, it's in logical size
>>> already */
>>> spin_lock(&block_rsv->lock);
>>> /* Mixed block groups accounting is not byte-accurate, avoid
>>> overflow */
>>> + pr_info("%s: block_rsv->size=%llu block_rsv->reserved=%llu\n",
>>> __func__,
>>> + block_rsv->size, block_rsv->reserved);
>>> if (buf->f_bfree >= block_rsv->size >> bits)
>>> buf->f_bfree -= block_rsv->size >> bits;
>>> else
>>>
>> And this extra snippet for available space.
>>
>> Thanks,
>> Qu
>>
>> diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c
>> index f452a94abdc3..f1a3e01a0ef5 100644
>> --- a/fs/btrfs/super.c
>> +++ b/fs/btrfs/super.c
>> @@ -1911,6 +1911,7 @@ static inline int
>> btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
>> * We aren't under the device list lock, so this is racy-ish,
>> but good
>> * enough for our purposes.
>> */
>> + pr_info("%s: original_free_bytes=%llu\n", __func__, *free_bytes);
>> nr_devices = fs_info->fs_devices->open_devices;
>> if (!nr_devices) {
>> smp_mb();
>> @@ -2005,6 +2006,7 @@ static inline int
>> btrfs_calc_avail_data_space(struct btrfs_fs_info *fs_info,
>>
>> kfree(devices_info);
>> *free_bytes = avail_space;
>> + pr_info("%s: calculated_bytes=%llu\n", __func__, avail_space);
>> return 0;
>> }
>>
Sorry for the date reply, was busy firefighting some bugs.
> Now logs this at 100% used:
>
> [90273.353449] btrfs_calc_avail_data_space: original_free_bytes=23583420416
> [90273.353449] btrfs_calc_avail_data_space: calculated_bytes=13662945280
This marks the beginning of one statefs call.
> [90273.369508] btrfs_statfs: found type=0x1 disk_used=90233212928 factor=1
> [90273.369536] btrfs_statfs: found type=0x1 disk_used=90233212928 factor=1
> [90273.369536] btrfs_statfs: found type=0x4 disk_used=339361792 factor=1
> [90273.369508] btrfs_statfs: found type=0x4 disk_used=339361792 factor=1
> [90273.369508] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
> [90273.369536] btrfs_statfs: found type=0x2 disk_used=16384 factor=1
> [90273.369508] btrfs_statfs: super_total_bytes=128835387392
> total_used=90572591104 factor=1
So far so good. All SINGLE chunks, total disk bytes are ~120GiB.
While totally used bytes are ~84GiB.
In theory, we should give ~36GiB.
> [90273.369508] btrfs_statfs: block_rsv->size=147554304
> block_rsv->reserved=147554304
block_rsv is tiny, just ~140 MiB, shouldn't cause much difference.
> [90273.369537] btrfs_statfs: super_total_bytes=128835387392
> total_used=90572591104 factor=1
So at this stage, f_bfree should be 74732024 - 288192 blocks.
^^^^^^^^ ^^^- block_rsv / 512
|- (total_bytes - total_used ) / 512
At least, f_bfree looks OK.
> [90273.369509] btrfs_calc_avail_data_space: original_free_bytes=23583420416
> [90273.369537] btrfs_statfs: block_rsv->size=147554304
> block_rsv->reserved=147554304
> [90273.369537] btrfs_calc_avail_data_space: original_free_bytes=23583420416
Still good, we have around ~21.9GiB unused data space across all
allocated data chunks.
All this ~21.9GiB should contribute to f_bavail.
Although it means you have some fragments, it's not a big deal at all.
> [90273.369509] btrfs_calc_avail_data_space: calculated_bytes=13662945280
> [90273.369537] btrfs_calc_avail_data_space: calculated_bytes=13662945280
And btrfs_calc_avail_data_space() find that we can allocate around
12.7GiB new data chunks.
This 12.7GiB also going to be part of f_bavail.
This means, you should have ~34GiB free space, before we do the
offending check:
if (!mixed && total_free_meta - thresh < block_rsv->size)
buf->f_bavail = 0;
This check is pretty old, from 2015, while recently we allow aggressive
metadata over-committing, thus we can have a lot of metadata reserved
space without really allocating new metadata chunks.
I'll try to find out a better calculation to co-operate with metadata
over-committing.
Feel free to remove all debugg snippets, and if you want some dirty
fixes, please try the attached diff.
Thanks,
Qu
> [90273.400227] btrfs_statfs: found type=0x1 disk_used=726834307072 factor=1
> [90273.400227] btrfs_statfs: found type=0x4 disk_used=4908548096 factor=1
> [90273.400227] btrfs_statfs: found type=0x2 disk_used=98304 factor=1
> [90273.400227] btrfs_statfs: super_total_bytes=8133881348096
> total_used=731742953472 factor=1
> [90273.400227] btrfs_statfs: block_rsv->size=536870912
> block_rsv->reserved=536821760
> [90273.400227] btrfs_calc_avail_data_space: original_free_bytes=1171038208
> [90273.400227] btrfs_calc_avail_data_space: calculated_bytes=7400493613056
>
diff --git a/fs/btrfs/super.c b/fs/btrfs/super.c index 1b151af25772..b8b67ab05f72 100644 --- a/fs/btrfs/super.c +++ b/fs/btrfs/super.c @@ -2032,7 +2032,6 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) unsigned factor = 1; struct btrfs_block_rsv *block_rsv = &fs_info->global_block_rsv; int ret; - u64 thresh = 0; int mixed = 0; rcu_read_lock(); @@ -2085,26 +2084,9 @@ static int btrfs_statfs(struct dentry *dentry, struct kstatfs *buf) if (ret) return ret; buf->f_bavail += div_u64(total_free_data, factor); + buf->f_bavail -= block_rsv->size; buf->f_bavail = buf->f_bavail >> bits; - /* - * We calculate the remaining metadata space minus global reserve. If - * this is (supposedly) smaller than zero, there's no space. But this - * does not hold in practice, the exhausted state happens where's still - * some positive delta. So we apply some guesswork and compare the - * delta to a 4M threshold. (Practically observed delta was ~2M.) - * - * We probably cannot calculate the exact threshold value because this - * depends on the internal reservations requested by various - * operations, so some operations that consume a few metadata will - * succeed even if the Avail is zero. But this is better than the other - * way around. - */ - thresh = SZ_4M; - - if (!mixed && total_free_meta - thresh < block_rsv->size) - buf->f_bavail = 0; - buf->f_type = BTRFS_SUPER_MAGIC; buf->f_bsize = dentry->d_sb->s_blocksize; buf->f_namelen = BTRFS_NAME_LEN;
Attachment:
signature.asc
Description: OpenPGP digital signature
