On 08/09/2016 03:11 PM, Hugo Mills wrote:
On Tue, Aug 09, 2016 at 06:27:33PM +0000, Hugo Mills wrote:
On Tue, Aug 09, 2016 at 02:26:14PM -0400, Chris Mason wrote:
On 08/09/2016 02:23 PM, Hugo Mills wrote:
Hi, Chris,
On Tue, Aug 09, 2016 at 02:02:20PM -0400, Chris Mason wrote:
On 08/09/2016 01:27 PM, Hugo Mills wrote:
Over the weekend, I started doing some maintenance on my server: I
upgraded to 4.7.0, and I started deleting a device from my array,
preparatory to putting in a larger one. About halfway through the
operation, several kernel threads hung up for a while (resulting in
"blocked for 120s" messages), and then the delete process seems to
have stopped entirely, although several kernel threads are at maximum
usage.
After a few hours, I rebooted the machine, and left it for a day or
so. I tried the delete again this afternoon, and it's done the same
thing again. The full log is included below. I have a kworker and a
btrfs-transaction pegged at close to 100% of a core each, and a
btrfs-cleaner (and the btrfs dev del process) in D state.
The FS was not under load at the time of the failure, and it passes
scrub. I haven't tried a btrfs check yet.
Thanks Hugo, can you nail down which line of code belongs to:
btrfs_async_run_delayed_refs+0xc6
I'm having a spot of trouble with this. The btrfs on this kernel is
built-in, and I've lost the contents of the build directory (it's done
by an overnight build script, and it's already built a 4.8-rc1 for one
of my other machines).
(gdb) file /boot/vmlinuz-4.7.0-dirty
BFD: /boot/vmlinuz-4.7.0-dirty: Warning: Ignoring section flag IMAGE_SCN_MEM_NOT_PAGED in section .bss
Reading symbols from /boot/vmlinuz-4.7.0-dirty...(no debugging symbols found)...done.
(gdb) list *btrfs_async_run_delayed_refs+0xc6
No symbol table is loaded. Use the "file" command.
There must be a way of getting this info from here, but I'm not
sure I know what it is. Build a new kernel from 4.7 with this
machine's config and run gdb on the btrfs.o file? Not a problem to do,
but it might take a little while.
As long as you use the same gcc and config file, it'll almost always
generate the same offsets/code. You can recompile with debug
symbols on and it'll be accurate.
OK. Back later.
(gdb) file fs/btrfs/btrfs.o
Reading symbols from fs/btrfs/btrfs.o...done.
(gdb) list *btrfs_async_run_delayed_refs+0xc6
0x13dae is in btrfs_async_run_delayed_refs (fs/btrfs/extent-tree.c:2915).
2910
2911 btrfs_queue_work(root->fs_info->extent_workers, &async->work);
2912
2913 if (wait) {
2914 wait_for_completion(&async->wait);
2915 ret = async->error;
2916 kfree(async);
2917 return ret;
2918 }
2919 return 0;
So its waiting on the actual delayed ref work but we don't see them in
the stack trace.
Can you please sysrq-w and sysrq-t?
-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html