Re: Filesystem hang on kernel 4.2.0 with copy reflink

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



It overflowed the dmesg buffer but hopefully contains enough cores - https://mark.zealey.org/download/btrfs_crash.txt

Some other output:

# mount
/dev/sdb1 on / type btrfs (rw,noatime,skip_balance,subvol=@)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
none on /sys/fs/cgroup type tmpfs (rw)
none on /sys/fs/fuse/connections type fusectl (rw)
none on /sys/kernel/debug type debugfs (rw)
none on /sys/kernel/security type securityfs (rw)
none on /sys/firmware/efi/efivars type efivarfs (rw)
udev on /dev type devtmpfs (rw,mode=0755)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=0620)
tmpfs on /run type tmpfs (rw,noexec,nosuid,size=10%,mode=0755)
none on /run/lock type tmpfs (rw,noexec,nosuid,nodev,size=5242880)
none on /run/shm type tmpfs (rw,nosuid,nodev)
none on /run/user type tmpfs (rw,noexec,nosuid,nodev,size=104857600,mode=0755)
none on /sys/fs/pstore type pstore (rw)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu type cgroup (rw,relatime,cpu)
cgroup on /sys/fs/cgroup/cpuacct type cgroup (rw,relatime,cpuacct)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,relatime,blkio)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,relatime,freezer,release_agent=/run/cgmanager/agents/cgm-release-agent.freezer) cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,relatime,net_cls,release_agent=/run/cgmanager/agents/cgm-release-agent.net_cls)
/dev/sdb1 on /home type btrfs (rw,noatime,skip_balance,subvol=@home)
/dev/sdb3 on /boot/efi type vfat (rw)
binfmt_misc on /proc/sys/fs/binfmt_misc type binfmt_misc (rw,noexec,nosuid,nodev)
rpc_pipefs on /run/rpc_pipefs type rpc_pipefs (rw)
systemd on /sys/fs/cgroup/systemd type cgroup (rw,noexec,nosuid,nodev,none,name=systemd)

 ps auwx|grep ' D'
root 275 0.0 0.0 0 0 ? D Jan02 2:29 [btrfs-transacti] root 361 0.0 0.0 0 0 ? D 13:30 0:00 [kworker/u16:5] root 404 0.0 0.0 0 0 ? D 13:31 0:00 [kworker/u16:7] root 1127 0.0 0.0 0 0 ? D 13:54 0:00 [kworker/u16:0] root 1137 0.0 0.0 0 0 ? D 13:54 0:00 [kworker/u16:2] root 1189 2.3 0.0 25932 2216 pts/7 D+ 13:55 0:02 cp -vax --reflink=always /.snapshots/psql/var/lib/postgresql/ . root 1191 0.0 0.0 0 0 ? D 13:55 0:00 [kworker/u16:3] root 1197 0.0 0.0 0 0 ? D 13:55 0:00 [kworker/u16:4] root 1200 0.0 0.0 0 0 ? D 13:55 0:00 [kworker/u16:8] root 1201 0.0 0.0 0 0 ? D 13:55 0:00 [kworker/u16:10] root 1230 0.0 0.0 0 0 ? D 13:55 0:00 [kworker/u16:15] root 1231 0.0 0.0 0 0 ? D 13:55 0:00 [kworker/u16:16] root 14569 0.0 0.0 0 0 ? D 12:18 0:00 [kworker/u16:9] root 14572 0.0 0.0 0 0 ? D 12:19 0:00 [kworker/u16:11] root 14573 0.0 0.0 0 0 ? D 12:19 0:00 [kworker/u16:12] root 14582 0.0 0.0 0 0 ? D 12:19 0:00 [kworker/u16:13] root 32228 0.0 0.0 0 0 ? D 13:17 0:00 [kworker/u16:1]

The last output of the cp:

‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25009’ -> ‘./postgresql/9.5/main/base/16385/25009’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25011’ -> ‘./postgresql/9.5/main/base/16385/25011’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25012’ -> ‘./postgresql/9.5/main/base/16385/25012’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25243’ -> ‘./postgresql/9.5/main/base/16385/25243’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25246’ -> ‘./postgresql/9.5/main/base/16385/25246’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25248’ -> ‘./postgresql/9.5/main/base/16385/25248’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25249’ -> ‘./postgresql/9.5/main/base/16385/25249’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25251’ -> ‘./postgresql/9.5/main/base/16385/25251’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25254’ -> ‘./postgresql/9.5/main/base/16385/25254’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25256’ -> ‘./postgresql/9.5/main/base/16385/25256’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25257’ -> ‘./postgresql/9.5/main/base/16385/25257’ ‘/.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25283’ -> ‘./postgresql/9.5/main/base/16385/25283’

And those (and other files) that it would have copied:

-rw------- 1 postgres postgres 0 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25243 -rw------- 1 postgres postgres 0 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25246 -rw------- 1 postgres postgres 8192 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25248 -rw------- 1 postgres postgres 8192 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25249 -rw------- 1 postgres postgres 0 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25251 -rw------- 1 postgres postgres 0 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25254 -rw------- 1 postgres postgres 8192 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25256 -rw------- 1 postgres postgres 8192 Dec 30 18:11 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25257 -rw------- 1 postgres postgres 409624576 Dec 30 19:10 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25283 -rw------- 1 postgres postgres 122880 Dec 30 18:29 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25283_fsm -rw------- 1 postgres postgres 0 Dec 30 18:22 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25284 -rw------- 1 postgres postgres 8192 Dec 30 18:22 /.snapshots/psql/var/lib/postgresql/9.5/main/base/16385/25285

Also I have quota tracking enabled on the btrfs volume if that makes any difference.

Mark

On 04/01/16 12:41, Jack Wang wrote:
Hi Mark,

Could you do below when the hang happens, and post the dmesg.

echo w > /proc/sysrq-trigger

2016-01-04 9:35 GMT+01:00 Mark Zealey <mark@xxxxxxxxxxxxxxxxx>:
Hi there, I've run into a very strange hang with btrfs. I was trying to
restore a directory (postgres database) from a readonly snapshot. To do this
i used the command `cp -ar --reflink=always`. This worked fine for 100s of
files, however when it got to a particular file 16 kworker threads (I have 8
processors in this system) got marked as being in D state (with 0 cpu usage
or disk usage) and I could not access the btrfs file system any more. I
can't see any kernel message or OOPS. Can you please let me know what
additional debug information I can provide to help track this issue down in
the kernel?

System is latest ubuntu 14.04 LTS with a backported wily kernel (package
linux-image-4.2.0-22-generic):

4.2.0-22-generic #27~14.04.1-Ubuntu SMP Fri Dec 18 10:57:53 UTC 2015 x86_64
x86_64 x86_64 GNU/Linux

Thanks

Mark
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux