Le 2015-09-20 12:51, Qu Wenruo a écrit :
Would you please use gdb to show the codes of
"btrfs_qgroup_rescan_worker+0x388" ?
(Need kernel debuginfo)
My guess is the following line:(pretty sure, but not 100% sure)
------
/*
* only update status, since the previous part has alreay
updated the
* qgroup info.
*/
trans = btrfs_start_transaction(fs_info->quota_root, 1);
<<<<<
if (IS_ERR(trans)) {
err = PTR_ERR(trans);
btrfs_err(fs_info,
"fail to start transaction for status
update: %d\n",
err);
goto done;
}
------
The kernel and modules were already compiled with debuginfo.
However for some reason, I couldn't get gdb disassembly of /proc/kcore
properly
aligned with the source I compiled: the asm code doesn't match the C
code shown
by gdb. In any case, watching the source of this function, this is the
only place
btrfs_start_transaction is called, so we can be 100% sure it's where
the
crash
happens indeed.
Yep, that's the only caller.
Here is some useful small hint to locate the code, if you are
interestied in kernel development.
# Not sure about whether ubuntu gzipped modules, at least Arch does
# compress it
$ cp <kernel modules dir>/kernel/fs/btrfs/btrfs.ko.gz /tmp/
$ gunzip /tmp/btrfs.ko.gz
$ gdb /tmp/btrfs.ko
# Make sure gdb read all the needed debuginfo
$ gdb list *(btrfs_qgroup_rescan_worker+0x388)
And gdb will find the code position for you.
Quite easy one, only backtrace info is needed.
Ah, thanks for the tips, I was loading whole vmlinux and using
/proc/kcore
as the core info, then adding the module with "add-symbol-file". But as
we're just looking for the code and not the variables, it was indeed
completely overkill.
(gdb) list *(btrfs_qgroup_rescan_worker+0x388)
0x98068 is in btrfs_qgroup_rescan_worker (fs/btrfs/qgroup.c:2328).
2323
2324 /*
2325 * only update status, since the previous part has
alreay updated the
2326 * qgroup info.
2327 */
2328 trans = btrfs_start_transaction(fs_info->quota_root, 1);
2329 if (IS_ERR(trans)) {
2330 err = PTR_ERR(trans);
2331 btrfs_err(fs_info,
2332 "fail to start transaction for status
update: %d\n",
So this just confirms what we were already 99% sure of.
Another hint is about how to collect the kernel crash info.
Your netconsole setup would be definitely one good practice.
Another one I use to collect crash info is kdump.
Ubuntu should have a good wiki on it.
I've already come across kdump a few times, but never really look into
it.
To debug the other complicated extend backref bug, it could be of some
use.
So, as a quick summary of this big thread, it seems I've been
hitting
3 bugs, all reproductible :
- kernel BUG on balance (this original thread)
For this, I can't provide much help, as extent backref bug is quite
hard to debug, unless a developer is interested in it and find a
stable way to reproduce it.
Yes, unfortunately as it looks so much like a race condition, I know I
can
reproduce it with my worflow, but it can take between 1 minute and 12
hours,
so I wouldn't call it a "stable way" to reproduce it unfortunately :(
Still if any dev is interested in it, I can reproduce it, with a
patched
kernel if needed.
Maybe you are already doing it, you can only compile the btrfs
modules, which will be far more faster than compile the whole kernel,
if and only if the compiled module can be loaded.
Yes, I've compiled this 4.3.0-rc1 in a completely modular form, so I'll
try to
load the modified module and see if the running kernel accepts it. I
have to rmmod
the loaded module first, hence umounting any btrfs fs before that.
Should be able
to do it in a couple hours.
I'll delete again all my snapshots and run my script. Should be easy to
trigger
the (hopefully worked-around) bug again.
Regards,
--
Stéphane.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html