Re: Re[2]: Rough (re)start with btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 2, 2019 at 1:02 PM Hendrik Friedel <hendrik@xxxxxxxxxxxxx> wrote:
>
> >What scheduler is being used for the drive?
> >
> ># cat /sys/block/<dev>/queue/scheduler
> [mq-deadline] none

At first I thought you might be running into this bug
https://lwn.net/Articles/774440/

However:

[Mo Apr 29 20:44:47 2019]       Not tainted 4.19.0-0.bpo.2-amd64 #1
Debian 4.19.16-1~bpo9+1

This is actually based on 4.19.16 which has the fix for that.


[Mo Apr 29 06:44:32 2019] systemd[1]: apt-daily-upgrade.timer: Adding
36min 35.299087s random time.
[Mo Apr 29 20:44:47 2019] INFO: task btrfs-transacti:10227 blocked for
more than 120 seconds.

Literally nothing for hours before the blocking. And I don't see
anything off during device discovery.

Qu would know better but usually developers ask for sysrq+w when
there's blocked tasks.
https://www.kernel.org/doc/html/v4.11/admin-guide/sysrq.html

Basically as root issue
# echo 1 >/proc/sys/kernel/sysrq
# echo w > /proc/sysrq-trigger

What I do is run the first command and type out the second command but
do not press return; in another shell reproduce the hang, and then go
back to the first shell and hit return. That way it doesn't take a
minute or two to type out during the hang. The result appears in
dmesg, so stop the operation causing the hang if possible and then
'dmesg>dmesg.txt' and attach it. Also, you'll want to reboot with
'log_bug_len=1M' because the sysrq+w that gets dumped to dmesg will
fill up the kernel message buffer.

> Done (also the two smartctl outputs).

I don't see anything weird there either. The errors are a little weird
but predate the Btrfs error by a lot.


> I was tempted to ask, whether this should be fixed. On the other hand, I am not even sure anything bad happened (except, well, the system -at least the copy- seemed to hang).

It could be a bug somewhere, but question is where. The workload is
only copy? Seems trivial and not prone to lock contention.

You know what? Try changing the scheduler from mq-deadline to none.
Change nothing else. Now try to reproduce. Let's see if it still
happens.

Also, what are the mount options?

-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux