....
That shell is still hung in the kernel, on a "disk sleep" fifteen
minutes later, and one of my CPU cores is pegged at 100%,
and the disk activity light is not coming on at all.
......
This is not disk bound activity ... it's kernel code bound.
That's insane ... and sad.
'perf top' is my first thought.... it might at least highlight the area
gobbling up cpu time.
I regularly see a limited balance (btrfs balance -musage=30 -dusage=30
or something like that) take 3-4 days on a m3.medium (1 core) or
m3.large (2 core?) EC2 VM with a 1Tb filesystem.
I see plenty of the 'hung kernel timeout' messages within dmesg (like
your first post) but it eventually recovers and behaves.
About once every 6-9 months the filesystem will break on one of the
single core VMs, and I'll end up wiping / starting again..... on a dual
core VM it's run without issue and behaved much better.
(It's an 'offsite backup-backup server'. I tend to use one of the latest
stable kernels - e.g. 4.9 at the moment).
David
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html