Re: About hung task on generic/041

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Jul 13, 2018 at 11:33:39AM +0100, Filipe Manana wrote:
>On Fri, Jul 13, 2018 at 9:44 AM, Lu Fengqi <lufq.fnst@xxxxxxxxxxxxxx> wrote:
>> On Thu, Jul 12, 2018 at 08:33:59PM +0800, Lu Fengqi wrote:
>>>On Thu, Jul 12, 2018 at 11:40:54AM +0100, Filipe Manana wrote:
>>>>On Wed, Jul 11, 2018 at 10:02 AM, Lu Fengqi <lufq.fnst@xxxxxxxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> When I run generic/041 with v4.18-rc3 (turn on kasan and hung task
>>>>> detection), btrfs-transaction kthread will trigger the hung task timeout
>>>>> (stall at wait_event in btrfs_commit_transaction). At the same time, you
>>>>> can see that xfs_io -c fsync will occupy 100% of the CPU. I am not sure
>>>>> whether this is a problem. Any suggestion?
>>>>
>>>>Well, something at 100% cpu and that seems hang forever is definitely
>>>>a problem, specially a workload as simple as the one in generic/041
>>>
>>>To clarify, the hung task will end within 500s. Without KASAN, it will
>>>end within 80s, so it won't trigger hung task timeout 120s. I'm not sure
>>>if this is just slow, or have some problem?
>>
>> Well, I tried to run generic/041 with v4.18-rc4(with KASAN) on the other
>> machine(with HDD) and it didn't finish all night. The hung task maybe
>> only end within 500s on SSD.
>>
>>>
>>>>(never happened to me, even on vanilla 4.18-rc4).
>>
>> See the attachment kernel_config. Maybe some config make you can't
>> replicate the case.
>
>Don't have to look into that, but I'm attaching mine and then you can
>compare them.

You must set the following config. As mentioned above, the test won't hit
the hung task timeout without *KASAN*.

CONFIG_KASAN=y
CONFIG_KASAN_EXTRA=y
CONFIG_KASAN_OUTLINE=y

>
>>
>>>>Do you have the stack trace for the fsync task? What you pasted below
>>>
>>>I will send the stack trace tomorrow.
>>
>> See the attachment kasan.log.xz.
>>
>> From the log it seems that the time is consumed in the
>> btrfs_log_inode_parent loop call btrfs_log_inode.
>>
>> I'm very willing to provide a trace(without KASAN) for comparison, but when
>> I run both systemtap and testcase, I have another problem.
>>
>> See the attachment btrfs_sync_file.stp and 4.18-rc4.dmesg.
>
>Are you sure you running a vanilla kernel, without any other btrfs patches?
>This test case has been around since 2015 and no one ever run into
>such problem (it takes around 15 seconds to finish here, on 2 vms with
>a debug kernel).
>
>Does that happen to you on 4.17 or older kernels too? If it doesn't,
>then I suggest bisecting.

As soon as I turn on KASAN, the test case will encounter this problem at
the vanilla 4.17/4.18-rc3/4.18-rc4 kernel(no other patches).

-- 
Thanks,
Lu

>
>>
>> --
>> Thanks,
>> Lu
>>
>>>
>>>--
>>>Thanks,
>>>Lu
>>>
>>>>is only for the transaction kthread and that alone doesn't help.
>>>>
>>>>>
>>>>>
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>>--
>>>>Filipe David Manana,
>>>>
>>>>“Whether you think you can, or you think you can't — you're right.”
>>>>
>>>>
>>
>>
>
>
>
>-- 
>Filipe David Manana,
>
>“Whether you think you can, or you think you can't — you're right.”
>
>


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux