Hello, I think I've encountered a deadlock between btrfs-transacti and postgres process(es). This is system information (btrfs fi usage obtained after poweroff and boot): # cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core) # uname -a Linux prod-dbsnap-01 5.2.1-1.el7.elrepo.x86_64 #1 SMP Sun Jul 14 08:15:04 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux # btrfs --version btrfs-progs v5.2 # btrfs filesystem usage /data/pg_data Overall: Device size: 2.00TiB Device allocated: 345.03GiB Device unallocated: 1.66TiB Device missing: 0.00B Used: 338.07GiB Free (estimated): 1.67TiB (min: 854.27GiB) Data ratio: 1.00 Metadata ratio: 2.00 Global reserve: 512.00MiB (used: 0.00B) Data,RAID0: Size:332.00GiB, Used:329.22GiB /dev/sdb 83.00GiB /dev/sdc 83.00GiB /dev/sdd 83.00GiB /dev/sde 83.00GiB Metadata,RAID10: Size:6.50GiB, Used:4.42GiB /dev/sdb 3.25GiB /dev/sdc 3.25GiB /dev/sdd 3.25GiB /dev/sde 3.25GiB System,RAID10: Size:16.00MiB, Used:48.00KiB /dev/sdb 8.00MiB /dev/sdc 8.00MiB /dev/sdd 8.00MiB /dev/sde 8.00MiB Unallocated: /dev/sdb 425.74GiB /dev/sdc 425.74GiB /dev/sdd 425.74GiB /dev/sde 425.74GiB There were three btrfs subvolumes and on each one there was a Postgres database slave doing recovery (single threaded writing). But there was a lot of writing. And prior to starting Postgres slaves I was restoring base backup from the backup server, which was being done by a number of parallel rsync processes (12 at most, I think). The file system is mounted with: # grep btrfs /proc/mounts /dev/sdd /data/pg_data btrfs rw,noatime,compress-force=zstd:3,space_cache,subvolid=5,subvol=/ 0 0 After several hours I got this in /var/log/messages: # grep 'INFO: task .*blocked for more than' messages Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 122 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 245 seconds. Full log is in the attachment. It seems to me that there is a deadlock between btrfs-transacti and any process which is trying to write something (not sure about reading). While this was going on the cpu usage (according to top) was non-existant. There are 4 cpus (it's a virtual machine in VmWare) and 3 were 100% idle. The 4th was 100% in waiting. (I didn't find out which process was on that cpu, unfortunately.) I powered off the machine (yesterday), booted this morning and things are working without errors. I stopped postgres clusters, though. I have a few questions: 1. After something like this happens and the machine is rebooted is there a procedure which would lower the probablity of encountering the deadlock again (maybe btrfs scrub or btrfs defragment or something like that)? This happened after a heavy write activity, so maybe fragmentation had something to do with it. 2. Should I run btrfsck (or something else) to verify on-disk integrity after a problem like this? Or it's just an in-memory problem, so I can assume that nothing bad happened to the data on disks. 3. I'd like to write a watchdog program to catch deadlocks and reboot (probably power-cycle) the VM, but I'm not sure what would the appropriate check be. Does it have to write something to the disk or reading would be sufficient? And how to bypass the OS buffer cache (fsync() or O_DIRECT should do it for writing, but I'm not sure about reading)? What would the appropriate timeout be? (If the operation doesn't complete in xx seconds a reboot should be triggered, but I don't know how many seconds to wait when there's a heavy load and things are just slow, but there's no deadlock.) Should I put watchdog process in real-time class (or however the equivalent is called on Linux)? Since this is a mainline kernel, I'm not sure if I could assume that real-time support won't have bugs of its own. And last, but not least, is there additional data that could help with debugging issues like this? (If possible, something that could be programmed into watchdog service.)
Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs-transacti D 0 1361 2 0x80004000 Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? blk_flush_plug_list+0xd1/0x100 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? bit_wait+0x50/0x50 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:47:09 prod-dbsnap-01 kernel: io_schedule+0x16/0x40 Jul 17 16:47:09 prod-dbsnap-01 kernel: bit_wait_io+0x11/0x50 Jul 17 16:47:09 prod-dbsnap-01 kernel: __wait_on_bit+0x6b/0xa0 Jul 17 16:47:09 prod-dbsnap-01 kernel: out_of_line_wait_on_bit+0x91/0xb0 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? init_wait_var_entry+0x50/0x50 Jul 17 16:47:09 prod-dbsnap-01 kernel: lock_extent_buffer_for_io+0x12f/0x320 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btree_write_cache_pages+0x207/0x420 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btree_writepages+0x5d/0x70 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: do_writepages+0x1f/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __filemap_fdatawrite_range+0xc6/0x100 Jul 17 16:47:09 prod-dbsnap-01 kernel: filemap_fdatawrite_range+0x13/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_write_marked_extents+0x107/0x150 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_write_and_wait_transaction.isra.26+0x4d/0xa0 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_commit_transaction+0x7b4/0xab0 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: transaction_kthread+0x157/0x190 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: kthread+0x105/0x140 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? kthread_bind+0x20/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: ret_from_fork+0x35/0x40 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres D 0 62682 20074 0x00004000 Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7fdbacc98310 Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 83 3d 8d 6c 2d 00 00 75 10 b8 4b 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 31 c3 48 83 ec 08 e8 2e 6a 01 00 48 89 04 24 Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff8e7c2908 EFLAGS: 00000246 ORIG_RAX: 000000000000004b Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000018cbc RCX: 00007fdbacc98310 Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 0000000000018cbc RDI: 0000000000000004 Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004 Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000001228189 R14: 00007fdb9caf9280 R15: 0000000000000100 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres D 0 80145 80126 0x00004000 Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x163/0xdd0 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de34bdd00 Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: Bad RIP value. Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d058 EFLAGS: 00000246 ORIG_RAX: 000000000000004a Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1de34bdd00 Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000120 RSI: 00007f1dc4a9e580 RDI: 0000000000000003 Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 00007f1de38c97f0 R08: 00007f1dc4a9e690 R09: 00007f1dc4a9e588 Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 00007f1de1303c8c R11: 0000000000000246 R12: 00000086e3333668 Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000000000001 R14: 00007fff3d26d560 R15: 0000000000000010 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres D 0 87299 80126 0x00004000 Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de1031310 Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: Bad RIP value. Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d208 EFLAGS: 00000246 ORIG_RAX: 000000000000004b Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 00000000000086e3 RCX: 00007f1de1031310 Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 00000000000086e3 RDI: 0000000000000004 Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004 Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000001650289 R14: 00007f1dd0e92280 R15: 00000000000000e0 Jul 17 16:47:09 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 122 seconds. Jul 17 16:47:09 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:47:09 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:47:09 prod-dbsnap-01 kernel: postgres D 0 93108 93106 0x00004000 Jul 17 16:47:09 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:47:09 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:47:09 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? btrfs_must_commit_transaction+0x60/0x70 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:47:09 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:47:09 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:47:09 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20 Jul 17 16:47:09 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:47:09 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:47:09 prod-dbsnap-01 kernel: RIP: 0033:0x7f489051dd00 Jul 17 16:47:09 prod-dbsnap-01 kernel: Code: Bad RIP value. Jul 17 16:47:09 prod-dbsnap-01 kernel: RSP: 002b:00007fff39abd838 EFLAGS: 00000246 ORIG_RAX: 000000000000004a Jul 17 16:47:09 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f489051dd00 Jul 17 16:47:09 prod-dbsnap-01 kernel: RDX: 0000000000000180 RSI: 0000000000000002 RDI: 0000000000000003 Jul 17 16:47:09 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 3839303030303030 R09: 00007f488dfe920d Jul 17 16:47:09 prod-dbsnap-01 kernel: R10: 00007f488e363c8c R11: 0000000000000246 R12: 00007fff39abe200 Jul 17 16:47:09 prod-dbsnap-01 kernel: R13: 0000000000000000 R14: 0000000000000014 R15: 00007fff39abe200 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task btrfs-transacti:1361 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs-transacti D 0 1361 2 0x80004000 Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? blk_flush_plug_list+0xd1/0x100 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? bit_wait+0x50/0x50 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:49:12 prod-dbsnap-01 kernel: io_schedule+0x16/0x40 Jul 17 16:49:12 prod-dbsnap-01 kernel: bit_wait_io+0x11/0x50 Jul 17 16:49:12 prod-dbsnap-01 kernel: __wait_on_bit+0x6b/0xa0 Jul 17 16:49:12 prod-dbsnap-01 kernel: out_of_line_wait_on_bit+0x91/0xb0 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? init_wait_var_entry+0x50/0x50 Jul 17 16:49:12 prod-dbsnap-01 kernel: lock_extent_buffer_for_io+0x12f/0x320 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btree_write_cache_pages+0x207/0x420 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btree_writepages+0x5d/0x70 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: do_writepages+0x1f/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __filemap_fdatawrite_range+0xc6/0x100 Jul 17 16:49:12 prod-dbsnap-01 kernel: filemap_fdatawrite_range+0x13/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_write_marked_extents+0x107/0x150 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_write_and_wait_transaction.isra.26+0x4d/0xa0 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_commit_transaction+0x7b4/0xab0 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: transaction_kthread+0x157/0x190 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: kthread+0x105/0x140 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? btrfs_cleanup_transaction+0x580/0x580 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? kthread_bind+0x20/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: ret_from_fork+0x35/0x40 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:62682 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres D 0 62682 20074 0x00004000 Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7fdbacc98310 Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value. Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff8e7c2908 EFLAGS: 00000246 ORIG_RAX: 000000000000004b Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000018cbc RCX: 00007fdbacc98310 Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 0000000000018cbc RDI: 0000000000000004 Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004 Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000001228189 R14: 00007fdb9caf9280 R15: 0000000000000100 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:80145 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres D 0 80145 80126 0x00004000 Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x163/0xdd0 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de34bdd00 Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value. Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d058 EFLAGS: 00000246 ORIG_RAX: 000000000000004a Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f1de34bdd00 Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000120 RSI: 00007f1dc4a9e580 RDI: 0000000000000003 Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 00007f1de38c97f0 R08: 00007f1dc4a9e690 R09: 00007f1dc4a9e588 Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 00007f1de1303c8c R11: 0000000000000246 R12: 00000086e3333668 Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000000000001 R14: 00007fff3d26d560 R15: 0000000000000010 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:87299 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres D 0 87299 80126 0x00004000 Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? __switch_to_asm+0x40/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? prepare_to_wait_event+0x82/0x170 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fdatasync+0x17/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7f1de1031310 Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value. Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff3d26d208 EFLAGS: 00000246 ORIG_RAX: 000000000000004b Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 00000000000086e3 RCX: 00007f1de1031310 Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000027 RSI: 00000000000086e3 RDI: 0000000000000004 Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000 Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 0000000000004000 R11: 0000000000000246 R12: 0000000000000004 Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000001650289 R14: 00007f1dd0e92280 R15: 00000000000000e0 Jul 17 16:49:12 prod-dbsnap-01 kernel: INFO: task postgres:93108 blocked for more than 245 seconds. Jul 17 16:49:12 prod-dbsnap-01 kernel: Not tainted 5.2.1-1.el7.elrepo.x86_64 #1 Jul 17 16:49:12 prod-dbsnap-01 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. Jul 17 16:49:12 prod-dbsnap-01 kernel: postgres D 0 93108 93106 0x00004000 Jul 17 16:49:12 prod-dbsnap-01 kernel: Call Trace: Jul 17 16:49:12 prod-dbsnap-01 kernel: __schedule+0x2a9/0x650 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule+0x38/0xa0 Jul 17 16:49:12 prod-dbsnap-01 kernel: schedule_preempt_disabled+0xe/0x10 Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock.isra.11+0x233/0x4e0 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? btrfs_must_commit_transaction+0x60/0x70 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: __mutex_lock_slowpath+0x13/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: mutex_lock+0x31/0x40 Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_inode_parent+0x70c/0xdd0 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? wait_current_trans+0x33/0x100 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: ? _cond_resched+0x19/0x30 Jul 17 16:49:12 prod-dbsnap-01 kernel: ? join_transaction+0x2b/0x460 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_log_dentry_safe+0x59/0x80 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: btrfs_sync_file+0x35b/0x410 [btrfs] Jul 17 16:49:12 prod-dbsnap-01 kernel: vfs_fsync_range+0x3f/0x80 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_fsync+0x3d/0x70 Jul 17 16:49:12 prod-dbsnap-01 kernel: __x64_sys_fsync+0x14/0x20 Jul 17 16:49:12 prod-dbsnap-01 kernel: do_syscall_64+0x60/0x1c0 Jul 17 16:49:12 prod-dbsnap-01 kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9 Jul 17 16:49:12 prod-dbsnap-01 kernel: RIP: 0033:0x7f489051dd00 Jul 17 16:49:12 prod-dbsnap-01 kernel: Code: Bad RIP value. Jul 17 16:49:12 prod-dbsnap-01 kernel: RSP: 002b:00007fff39abd838 EFLAGS: 00000246 ORIG_RAX: 000000000000004a Jul 17 16:49:12 prod-dbsnap-01 kernel: RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007f489051dd00 Jul 17 16:49:12 prod-dbsnap-01 kernel: RDX: 0000000000000180 RSI: 0000000000000002 RDI: 0000000000000003 Jul 17 16:49:12 prod-dbsnap-01 kernel: RBP: 0000000000000000 R08: 3839303030303030 R09: 00007f488dfe920d Jul 17 16:49:12 prod-dbsnap-01 kernel: R10: 00007f488e363c8c R11: 0000000000000246 R12: 00007fff39abe200 Jul 17 16:49:12 prod-dbsnap-01 kernel: R13: 0000000000000000 R14: 0000000000000014 R15: 00007fff39abe200
