On 2019/4/9 上午7:38, David Sterba wrote:
> On Thu, Apr 04, 2019 at 08:56:15AM +0800, Qu Wenruo wrote:
>>
>>
>> On 2019/4/3 下午11:43, David Sterba wrote:
>>> On Tue, Apr 02, 2019 at 05:45:54PM +0800, Qu Wenruo wrote:
>>>> There are two tree lock events which can sleep:
>>>> - btrfs_tree_read_lock()
>>>> - btrfs_tree_lock()
>>>>
>>>> Sometimes we may need to look into the concurrency picture of the fs.
>>>> For that case, we need the execution time of above two functions and the
>>>> owner of @eb.
>>>>
>>>> Here we introduce a trace events for user space tools like bcc, to get
>>>> the execution time of above two functions, and get detailed owner info
>>>> where eBPF code can't.
>>>>
>>>> All the overhead is hidden behind the trace events, so if events are not
>>>> enabled, there is no overhead.
>>>>
>>>> Also, since this patch and later user space tool only cares about the
>>>> execution time and owner, other info like bytenr is ignored in this
>>>> events.
>>>
>>> The time is definetelly the interesting part here, I'm thinking if the
>>> block number could also provide some useful hints. The gathered trace
>>> data can be used to group by blocks to compare the times, or see the
>>> histogram of times.
>>>
>>> The block at minimum would be useful and I don't want to add more
>>> information there, like level or nritems. In case we do want that
>>> information, a heavier tracepoint version should be added with all the
>>> other data.
>>
>> The block number is not that useful in current form.
>> As a block number can be the same while contains data for different
>> trees between different transactions.
>
> Right, so the generation would have to be printed too, that's more than
> I'd thought.
Generation itself already makes sense.
With tree locking it could be used as a minor indicator of transaction
locking (although still unreliable).
>
>> That's the reason why owner makes more sense in this situation.
>>
>> To make block number makes sense, we need to pair
>> trace_btrfs_tree_lock() with trace_btrfs_tree_unlock().
>>
>>
>>>
>>>> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
>>>> ---
>> [snip]
>>>> + ),
>>>> +
>>>> + TP_printk_btrfs("start_ns=%llu end_ns=%llu owner=%llu is_log_tree=%d",
>>>> + __entry->start_ns, __entry->end_ns, __entry->owner,
>>>> + __entry->is_log_tree)
>>>
>>> I think the delta between start_ns and end_ns should be printed too, so
>>> it's obvious without further processing:
>>
>> My idea is to put that processing part into user space.
>> So we can reduce the space usage of the event buffer.
>>
>> But it will do no harm anyway, so I'll add the output.
>
> Postprocessing is always an option, but the calculated value is more for
> a quick glance or simply is suitable for copy&paste of the trace log.
It won't do harm, but quick glance may not make sense for such heavy
calling trace.
Anyway, I'll add generation, block and diff_ns into the output.
Since bcc can do extra filter on the trace events, it shouldn't cause
much buffer space problem.
Thanks,
Qu
Attachment:
signature.asc
Description: OpenPGP digital signature
