On 16.05.19 г. 0:36 ч., Zygo Blaxell wrote:
> "Storm-of-soft-lockups" is a failure mode where btrfs puts all of the
> CPU cores in kernel functions that are unable to make forward progress,
> but also unwilling to release their respective CPU cores. This is
> usually accompanied by a lot of CPU usage (detectable as either kvm CPU
> usage or just a lot of CPU fan noise) though I don't know if all cores
> are spinning or only some of them.
>
> The kernel console presents a continual stream of "BUG: soft lockup"
> warnings for some days. None of the call traces change during this time.
> The only way out is to reboot.
>
> You can reproduce this by writing a bunch of data to a filesystem while
> bees is running on all cores. It takes a few days to occur naturally.
> It can probably be sped up by just doing a bunch of random LOGICAL_INO
> ioctls in a tight loop on each core.
>
> Here's an instance on a 4-CPU VM where CPU#0 is running btrfs-transaction
> (btrfs_try_tree_write_lock) and CPU#1-3 are running the LOGICAL_INO
> ioctl (btrfs_tree_read_lock_atomic):
Provide output of all sleeping threads when this occur via
echo w > /proc/sysrq-trigger.
Also do you have this patch on the affected machine:
38e3eebff643 ("btrfs: honor path->skip_locking in backref code") can you
try and test with it applied ?
<SNIP>