ftrace of kernel BUG at extent-tree.c during balance

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

This bug has been discussed several times here already, both recently and also in 2012 where the bug was deemed fixed in btrfs-next, but probably wasn't in the end [1]. It has also been reported at a couple other places, such as [2].

I managed to add some trace_printk's at the beggining of each function present in the stack trace, and reproduced the problem. Took me a couple days to reproduce it and get potentially useful logs.

The stacktrace is :

[60424.709967] kernel BUG at fs/btrfs/extent-tree.c:1841!
[60424.710748] CPU: 0 PID: 1622 Comm: kworker/u2:3 Tainted: G W 4.3.0-rc1 #1 [60424.710816] Workqueue: btrfs-extent-refs btrfs_extent_refs_helper [btrfs] [60424.710849] task: ffff880113fd0000 ti: ffff8800550f0000 task.ti: ffff8800550f0000 [60424.710874] RIP: 0010:[<ffffffffc0acb9d5>] [<ffffffffc0acb9d5>] insert_inline_extent_backref+0x195/0x1a0 [btrfs]
[60424.711189] Call Trace:
[60424.711206] [<ffffffffc0acbac1>] __btrfs_inc_extent_ref+0xe1/0x290 [btrfs] [60424.711234] [<ffffffffc0ad25ee>] __btrfs_run_delayed_refs+0xd1e/0x1070 [btrfs] [60424.711264] [<ffffffff811370fc>] ? ring_buffer_unlock_commit+0x2c/0x260
[60424.711287]  [<ffffffff81143760>] ? __trace_bprintk+0x50/0x60
[60424.711307] [<ffffffffc0ad55f1>] btrfs_run_delayed_refs+0x91/0x2c0 [btrfs] [60424.711332] [<ffffffffc0aeaec0>] ? btrfs_start_transaction_lflush+0x20/0x20 [btrfs] [60424.711358] [<ffffffffc0ad5857>] delayed_ref_async_start+0x37/0x90 [btrfs] [60424.711384] [<ffffffffc0b19e50>] normal_work_helper+0xc0/0x270 [btrfs]
[60424.711404]  [<ffffffff817ad688>] ftrace_graph_caller+0xa8/0xa8
[60424.711427] [<ffffffffc0b1a232>] btrfs_extent_refs_helper+0x12/0x20 [btrfs]
[60424.711446]  [<ffffffff817ad688>] ftrace_graph_caller+0xa8/0xa8
[60424.711464]  [<ffffffff8109127e>] process_one_work+0x14e/0x3d0
[60424.711481]  [<ffffffff8109192a>] worker_thread+0x11a/0x470
[60424.711497]  [<ffffffff81091810>] ? rescuer_thread+0x310/0x310
[60424.711514]  [<ffffffff81097059>] kthread+0xc9/0xe0
[60424.711529]  [<ffffffff81096f90>] ? kthread_park+0x60/0x60

Corresponding code is (slightly modified on my end for ftrace) :

static noinline_for_stack
int insert_inline_extent_backref(struct btrfs_trans_handle *trans,
         struct btrfs_root *root,
         struct btrfs_path *path,
         u64 bytenr, u64 num_bytes, u64 parent,
         u64 root_objectid, u64 owner,
         u64 offset, int refs_to_add,
         struct btrfs_delayed_extent_op *extent_op)
{
  struct btrfs_extent_inline_ref *iref;
  int ret;
trace_printk("insert_inline_extent_backref args: trans=%p root=%p path=%p bytenr=%llu num_bytes=%llu parent=%llu root_objectid=%llu owner=%llu offset=%llu refs_to_add=%d extent_op=%p\n", trans, root, path, bytenr, num_bytes, parent, root_objectid, owner, offset, refs_to_add, extent_op);

  ret = lookup_inline_extent_backref(trans, root, path, &iref,
             bytenr, num_bytes, parent,
             root_objectid, owner, offset, 1);
  if (ret == 0) {
    if (owner < BTRFS_FIRST_FREE_OBJECTID)
    {
      trace_printk("info: CONDITIONS MET FOR BUG, STOPPING TRACING\n");
      tracing_off();
printk("{btrfs} in insert_inline_extent_backref, got owner < BTRFS_FIRST_FREE_OBJECTID\n"); printk("{btrfs} with bytenr=%llu num_bytes=%llu parent=%llu root_objectid=%llu owner=%llu offset=%llu refs_to_add=%d BTRFS_FIRST_FREE_OBJECTID=%llu\n", bytenr, num_bytes, parent, root_objectid, owner, offset, refs_to_add, BTRFS_FIRST_FREE_OBJECTID);
      BUG(); // <== here
    }


The ftrace configuration was :

echo 8 > max_graph_depth
echo function_graph > current_tracer
echo :mod:btrfs > set_ftrace_filter
echo btrfs_{g,s}et_token_{8,16,32,64} btrfs_comp_cpu_keys read_extent_buffer map_private_extent_buffer free_extent_buffer release_extent_buffer check_buffer_tree_ref find_extent_buffer extent_buffer_uptodate verify_parent_transid mark_extent_buffer_accessed btrfs_set_path_blocking btrfs_set_lock_blocking_rw btrfs_tree_read_lock alloc_extent_buffer btrfs_find_create_tree_block read_extent_buffer_pages btree_read_extent_buffer_pages.constprop.56 read_tree_block btrfs_tree_read_unlock_blocking btrfs_clear_path_blocking bin_search > set_ftrace_notrace

The kernel was booted with nosmp to avoid out of order logs.

The ftrace data was collected by netcat and sent over the network to a logging machine. The kernel stacktrace was collected with netconsole on the same machine.

Over 100G of logs were created during the 3 days I was trying to reproduce the bug *and* getting a usable ftrace configuration. I had a script to pause/resume the balance from time to time to enable log splitting and keep only the one where the bug would happen. This last log is a little under 1G (uncompressed, 40M compressed). I suspect the last few lines of this log were not sent by the crashing machine (the trace_printk above doesn't appear), but the crash condition can be seen however (owner == 1).

The first and last 1000 lines of this log are available here :
http://www.speed47.net/tmp2/bug_first_1k.log.gz
http://www.speed47.net/tmp2/bug_last_1k.log.gz

The complete log is available here (40M) :
http://www.speed47.net/tmp2/btrfs_ftrace.2015-09-29.15-11-15.cpu0.gz

You can grep "args:" to get the trace_printk's indicating the values of the parameters a function was called with. Such code was added to the following functions:
insert_inline_extent_backref <== where the BUG_ON() triggers
__btrfs_inc_extent_ref
run_delayed_data_ref
run_delayed_tree_ref
__btrfs_run_delayed_refs
btrfs_run_delayed_refs
btrfs_drop_snapshot
btrfs_ioctl_balance
merge_reloc_roots
relocate_block_group
btrfs_relocate_block_group
btrfs_should_end_transaction
btrfs_relocate_chunk
__btrfs_balance
btrfs_balance

As already seen, the bug happens when owner == 1 (which is < 256 aka BTRFS_FIRST_FREE_OBJECTID).

May a developer have a look at it ? I can ftrace more/less and/or add more precise trace_printk's if needed.

Thanks,

--
Stéphane.

[1] http://thread.gmane.org/gmane.comp.file-systems.btrfs/19106/focus=19109
[2] https://bugzilla.opensuse.org/show_bug.cgi?id=940212

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux