On 2018/08/24 16:58, Qu Wenruo wrote:
>
>
> On 2018/8/24 下午3:54, Misono Tomohiro wrote:
>> On 2018/08/24 16:20, Qu Wenruo wrote:
>>>
>>>
>>> On 2018/8/24 下午3:14, Misono Tomohiro wrote:
>>>> Hi,
>>>>
>>>> On 2018/08/21 14:40, Qu Wenruo wrote:
>>>>> Commit c6887cd11149 ("Btrfs: don't do nocow check unless we have to")
>>>>> makes nocow check less frequent to improve performance.
>>>>>
>>>>> However for quota enabled case, such optimization could lead to extra
>>>>> unnecessary data reservation, which results failure for test case like
>>>>> btrfs/153 in fstests.
>>>>>
>>>>> Fix it by reverting to old behavior for quota enabled case.
>>>>>
>>>>> Fixes: c6887cd11149 ("Btrfs: don't do nocow check unless we have to")
>>>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>>>> ---
>>>>> changelog
>>>>> v2:
>>>>> Fix regression for quota+cow case. (Previously it will skip data
>>>>> reservation if quota is enabled, causing regression for limit case.
>>>>> Pointed out by Misono)
>>>>> ---
>>>>> fs/btrfs/file.c | 18 +++++++++++++++++-
>>>>> 1 file changed, 17 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
>>>>> index 2be00e873e92..183e5fb96f42 100644
>>>>> --- a/fs/btrfs/file.c
>>>>> +++ b/fs/btrfs/file.c
>>>>> @@ -1584,6 +1584,7 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>>>> int ret = 0;
>>>>> bool only_release_metadata = false;
>>>>> bool force_page_uptodate = false;
>>>>> + bool quota_enabled = test_bit(BTRFS_FS_QUOTA_ENABLED, &fs_info->flags);
>>>>>
>>>>> nrptrs = min(DIV_ROUND_UP(iov_iter_count(i), PAGE_SIZE),
>>>>> PAGE_SIZE / (sizeof(struct page *)));
>>>>> @@ -1624,13 +1625,28 @@ static noinline ssize_t btrfs_buffered_write(struct kiocb *iocb,
>>>>> fs_info->sectorsize);
>>>>>
>>>>> extent_changeset_release(data_reserved);
>>>>> +
>>>>> + /*
>>>>> + * If we have quota enabled, we must do the heavy lift nocow
>>>>> + * check here to avoid reserving data space, or we can hit
>>>>> + * limitation for NOCOW files.
>>>>> + */
>>>>> + if (quota_enabled) {
>>>>> + if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>>> + BTRFS_INODE_PREALLOC)) &&
>>>>> + check_can_nocow(BTRFS_I(inode), pos,
>>>>> + &write_bytes) > 0)
>>>>> + goto reserve_meta_only;
>>>>> + }
>>>>> ret = btrfs_check_data_free_space(inode, &data_reserved, pos,
>>>>> write_bytes);
>>>>> if (ret < 0) {
>>>>> if ((BTRFS_I(inode)->flags & (BTRFS_INODE_NODATACOW |
>>>>> BTRFS_INODE_PREALLOC)) &&
>>>>> check_can_nocow(BTRFS_I(inode), pos,
>>>>> - &write_bytes) > 0) {
>>>>> + &write_bytes) > 0 &&
>>>>
>>>>> + !quota_enabled) {
>>>>
>>>> (When we check this condition, quota_enabled must be false or otherwise
>>>> we have already goto reserve_meta_only. So it seems redundant.)
>>>
>>> It's possible that we have quota enabled, and then
>>> btrfs_check_data_free_space() failed with -EDQUOT.
>>>
>>> In that case, we need above !quota_enabled check to avoid unnecessary
>>> check and just go error branch.
>>
>> So should quota_enabled be checked before check_can_nocow()?
>
> Oh, yes, it should be put before nocow check.
>
>>
>>>
>>>>
>>>>> +reserve_meta_only:
>>>>> /*
>>>>> * For nodata cow case, no need to reserve
>>>>> * data space.
>>>>>
>>>>
>>>> I applied this patch on today's misc-next and it seems mostly ok, but
>>>> btrfs/022 sometimes gives following warning:
>>>
>>> This looks like related to the regression caused by commit
>>> c4c129db5da8f070147f175 ("btrfs: drop unused
>>> parameter qgroup_reserved").
>>>
>>> Would you please try reverting that patch?
>>
>> I think above commit is fixed by commit eb27db470 ("btrfs: fix
>> qgroup_free wrong num_bytes in btrfs_subvolume_reserve_metadata") which
>> is already in misc-next too.
>>
>> I reverted above two patch (and one more related patch 6b0cb14901
>> ("btrfs: drop useless member qgroup_reserved of btrfs_pending_snapshot")),
>> but get the same result.
>
> So something really goes wrong.
> I assume it's some error handler which double freed, but a quick glance
> nor my initial test run doesn't show something obvious.
>
> BTW, what's the possibility of such problem in your test environment?
It's like one in several times.
It may depend on hardware performance? (the machine is not so fast),
I also noticed following warning happens too (not always):
[84089.286669] WARNING: CPU: 4 PID: 19255 at fs/btrfs/extent-tree.c:4277 btrfs_free_reserved_data_space_noquota+0xd2/0xf0 [btrfs]
[84089.286670] Modules linked in: btrfs(O) xor zstd_decompress zstd_compress xxhash raid6_pq dm_thin_pool dm_persistent_data dm_bio_prison loop dm_flakey xt_CHECKSUM ipt_MASQUERADE tun bridge stp llc xt_conntrack ip_set nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security sunrpc intel_powerclamp kvm_intel kvm gpio_ich iTCO_wdt ipmi_ssif iTCO_vendor_support ipmi_si st irqbypass ipmi_devintf crct10dif_pclmul crc32_pclmul ipmi_msghandler ghash_clmulni_intel pcspkr acpi_power_meter i2c_i801 pcc_cpufreq i7core_edac lpc_ich acpi_cpufreq xfs libcrc32c mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb sr_mod hwmon uas ptp crc32c_intel cdrom usb_storage pps_core ata_generic megaraid_sas dca pata_acpi
[84089.286708] i2c_algo_bit ipv6 [last unloaded: xor]
[84089.286712] CPU: 4 PID: 19255 Comm: kworker/u25:1 Tainted: G W IO 4.18.0-rc8+ #98
[84089.286713] Hardware name: FUJITSU-SV PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.09.2619.N1 12/13/2010
[84089.286717] Workqueue: writeback wb_workfn (flush-btrfs-474)
[84089.286731] RIP: 0010:btrfs_free_reserved_data_space_noquota+0xd2/0xf0 [btrfs]
[84089.286731] Code: 49 89 d8 4c 89 f1 4c 89 fa 4c 89 e6 e8 27 51 d1 d2 48 8b 45 00 48 85 c0 75 db 41 c6 45 00 00 5b 5d 41 5c 41 5d 41 5e 41 5f c3 <0f> 0b 49 c7 45 28 00 00 00 00 e9 79 ff ff ff 0f 1f 44 00 00 66 2e
[84089.286752] RSP: 0018:ffff8ea1043176e0 EFLAGS: 00010287
[84089.286754] RAX: 0000000000004000 RBX: 0000000000005000 RCX: ffff8c1151e8cae8
[84089.286755] RDX: 0000000000000001 RSI: 00000000000c3000 RDI: ffff8c106602fa00
[84089.286755] RBP: ffffffffffffb000 R08: 0000000000000000 R09: 0000000000000143
[84089.286756] R10: ffff8c11dd95d000 R11: 00000000ffffff01 R12: ffff8c1151e80000
[84089.286757] R13: ffff8c106602fa00 R14: ffff8ea1043177e4 R15: ffff8c1151e80000
[84089.286758] FS: 0000000000000000(0000) GS:ffff8c1137d00000(0000) knlGS:0000000000000000
[84089.286759] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[84089.286760] CR2: 00007fcbb3a26000 CR3: 000000019400a002 CR4: 00000000000206e0
[84089.286761] Call Trace:
[84089.286779] btrfs_clear_bit_hook+0x1a2/0x3f0 [btrfs]
[84089.286796] clear_state_bit+0x51/0x1b0 [btrfs]
[84089.286812] __clear_extent_bit+0x364/0x3c0 [btrfs]
[84089.286841] extent_clear_unlock_delalloc+0x43/0x70 [btrfs]
[84089.286868] run_delalloc_nocow+0x930/0xa60 [btrfs]
[84089.286897] run_delalloc_range+0x19c/0x370 [btrfs]
[84089.286926] writepage_delalloc+0xcf/0x180 [btrfs]
[84089.286955] __extent_writepage+0x17a/0x310 [btrfs]
[84089.286984] extent_write_cache_pages+0x131/0x390 [btrfs]
[84089.287012] ? btrfs_i_callback+0x20/0x20 [btrfs]
[84089.287040] extent_writepages+0x50/0x80 [btrfs]
[84089.287045] do_writepages+0x4b/0xe0
[84089.287073] ? btrfs_get_token_32+0x6b/0x130 [btrfs]
[84089.287077] ? __writeback_single_inode+0x3d/0x320
[84089.287079] __writeback_single_inode+0x3d/0x320
[84089.287082] writeback_sb_inodes+0x19f/0x460
[84089.287086] wb_writeback+0x107/0x300
[84089.287090] ? wb_workfn+0xdf/0x410
[84089.287093] ? current_is_workqueue_rescuer+0x27/0x40
[84089.287095] wb_workfn+0xdf/0x410
[84089.287099] ? put_prev_entity+0x20/0x100
[84089.287103] process_one_work+0x18f/0x370
[84089.287106] worker_thread+0x30/0x380
[84089.287109] ? process_one_work+0x370/0x370
[84089.287112] kthread+0x113/0x130
[84089.287115] ? kthread_create_worker_on_cpu+0x70/0x70
[84089.287119] ret_from_fork+0x35/0x40
[84089.287122] ---[ end trace b5975ec96174c615 ]---
>
> Thanks,
> Qu
>
>>
>> Thanks,
>> Misono
>>
>>>
>>> Thanks,
>>> Qu
>>>
>>>
>>>>
>>>> [80244.152130] WARNING: CPU: 5 PID: 14575 at fs/btrfs/extent-tree.c:9742 btrfs_free_block_groups+0x2d7/0x440 [btrfs]
>>>> [80244.152132] Modules linked in: btrfs(O) xor zstd_decompress zstd_compress xxhash raid6_pq xt_CHECKSUM ipt_MASQUERADE tun bridge stp llc xt_conntrack ip_set nfnetlink iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack iptable_mangle iptable_raw iptable_security sunrpc intel_powerclamp kvm_intel kvm gpio_ich iTCO_wdt ipmi_ssif iTCO_vendor_support ipmi_si st irqbypass ipmi_devintf crct10dif_pclmul crc32_pclmul ipmi_msghandler ghash_clmulni_intel pcspkr acpi_power_meter i2c_i801 pcc_cpufreq i7core_edac lpc_ich acpi_cpufreq xfs libcrc32c mgag200 drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm igb sr_mod hwmon uas ptp crc32c_intel cdrom usb_storage pps_core ata_generic megaraid_sas dca pata_acpi i2c_algo_bit ipv6 [last unloaded: xor]
>>>> [80244.152185] CPU: 5 PID: 14575 Comm: umount Tainted: G W IO 4.18.0-rc8+ #98
>>>> [80244.152187] Hardware name: FUJITSU-SV PRIMERGY RX300 S6 /D2619, BIOS 6.00 Rev. 1.09.2619.N1 12/13/2010
>>>> [80244.152205] RIP: 0010:btrfs_free_block_groups+0x2d7/0x440 [btrfs]
>>>> [80244.152206] Code: 85 20 cb 00 00 48 39 c6 0f 84 b9 00 00 00 49 bf 00 01 00 00 00 00 ad de 48 8b 9d 20 cb 00 00 48 83 7b a0 00 0f 84 0d 01 00 00 <0f> 0b 48 8d 73 88 31 c9 31 d2 48 89 ef e8 27 7a ff ff 48 89 df e8
>>>> [80244.152235] RSP: 0018:ffff8ea10393fdb0 EFLAGS: 00010286
>>>> [80244.152237] RAX: ffff8c1025819e78 RBX: ffff8c1025819e78 RCX: 0000000000000000
>>>> [80244.152238] RDX: 0000000000000001 RSI: ffff8c115329cb20 RDI: ffff8c1025818e00
>>>> [80244.152239] RBP: ffff8c1153290000 R08: 0000000000000000 R09: 0000000000000000
>>>> [80244.152240] R10: ffff8c1025818e98 R11: 0000000000000002 R12: ffff8c11532900a0
>>>> [80244.152241] R13: 0000000000000000 R14: dead000000000200 R15: dead000000000100
>>>> [80244.152243] FS: 00007ff0cc3d8fc0(0000) GS:ffff8c1137d40000(0000) knlGS:0000000000000000
>>>> [80244.152244] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>>> [80244.152245] CR2: 00007ffe75e53c88 CR3: 0000000235d16001 CR4: 00000000000206e0
>>>> [80244.152246] Call Trace:
>>>> [80244.152270] close_ctree+0x146/0x320 [btrfs]
>>>> [80244.152276] ? kthread_stop+0x42/0xf0
>>>> [80244.152280] generic_shutdown_super+0x6c/0x110
>>>> [80244.152283] kill_anon_super+0xe/0x20
>>>> [80244.152298] btrfs_kill_super+0x13/0x100 [btrfs]
>>>> [80244.152301] deactivate_locked_super+0x3f/0x70
>>>> [80244.152303] cleanup_mnt+0x3b/0x70
>>>> [80244.152305] task_work_run+0x84/0xa0
>>>> [80244.152308] do_syscall_64+0x143/0x4cd
>>>> [80244.152311] ? do_page_fault+0x31/0x130
>>>> [80244.152314] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>>>> [80244.152316] RIP: 0033:0x7ff0cb43c1a7
>>>> [80244.152317] Code: ad 2b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 44 00 00 31 f6 e9 09 00 00 00 66 0f 1f 84 00 00 00 00 00 b8 a6 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d c9 ac 2b 00 f7 d8 64 89 01 48
>>>> [80244.152346] RSP: 002b:00007ffe75e554b8 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6
>>>> [80244.152348] RAX: 0000000000000000 RBX: 0000563ccced82d0 RCX: 00007ff0cb43c1a7
>>>> [80244.152349] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000563ccced84b0
>>>> [80244.152350] RBP: 0000563ccced84b0 R08: 0000000000000005 R09: 0000563ccced84d0
>>>> [80244.152351] R10: 00007ff0cb4ba320 R11: 0000000000000246 R12: 00007ff0cc1d0184
>>>> [80244.152352] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000
>>>> [80244.152354] ---[ end trace b5975ec96174c60c ]---
>>>> [80244.152358] BTRFS info (device sdh2): space_info 1 has 1022132224 free, is not full
>>>> [80244.152360] BTRFS info (device sdh2): space_info total=1082130432, used=60039168, pinned=0, reserved=0, may_use=18446744073709510656, readonly=0
>>>>
>>>> Obviously, may_use value is underflowed.
>>>>
>>>> Thanks,
>>>> Misono
>>>>
>>>>
>>>
>>
>