On Sat, Mar 15, 2014 at 7:51 AM, Duncan <1i5t5.duncan@xxxxxxx> wrote: > 1) Does running the snapper cleanup command from that cron job manually > trigger the problem as well? As you can imagine I'm not too keen to trigger this often. But yes, I just gave it a shot on my SSD and cleaning a few days of timelines triggered a panic. > 2) What about modifying the cron job to run hourly, or perhaps every six > hours, so it's deleting only 2 or 12 instead of 48 at a time? Does that > help? > > If so then it's a thundering herd problem. While definitely still a bug, > you'll at least have a workaround until its fixed. Definitely looks like a thundering herd problem. I stopped the cron jobs (including the creation of snapshots based on your later warning). However, I am my snapshots one at a time at a rate of one every 5-30 minutes, and while that is creating surprisingly high disk loads on my ssd and hard drives, I don't get any panics. I figured that having only one deletion pending per checkpoint would eliminate locking risk. I did get some blocked task messages in dmesg, like: [105538.121239] INFO: task mysqld:3006 blocked for more than 120 seconds. [105538.121251] Not tainted 3.13.6-gentoo #1 [105538.121256] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [105538.121262] mysqld D ffff880395f63e80 3432 3006 1 0x00000000 [105538.121273] ffff88028b623d38 0000000000000086 ffff88028b623dc8 ffffffff81c10440 [105538.121283] 0000000000000200 ffff88028b623fd8 ffff880395f63b80 0000000000012c40 [105538.121291] 0000000000012c40 ffff880395f63b80 00000000532b7877 ffff880410e7e578 [105538.121299] Call Trace: [105538.121316] [<ffffffff81623d73>] schedule+0x6a/0x6c [105538.121327] [<ffffffff81623f52>] schedule_preempt_disabled+0x9/0xb [105538.121337] [<ffffffff816251af>] __mutex_lock_slowpath+0x155/0x1af [105538.121347] [<ffffffff812b9db0>] ? radix_tree_tag_set+0x71/0xd4 [105538.121356] [<ffffffff81625225>] mutex_lock+0x1c/0x2e [105538.121365] [<ffffffff8123c168>] btrfs_log_inode_parent+0x161/0x308 [105538.121373] [<ffffffff8162466d>] ? mutex_unlock+0x11/0x13 [105538.121382] [<ffffffff8123cd37>] btrfs_log_dentry_safe+0x39/0x52 [105538.121390] [<ffffffff8121a0c9>] btrfs_sync_file+0x1bc/0x280 [105538.121401] [<ffffffff811339a3>] vfs_fsync_range+0x13/0x1d [105538.121409] [<ffffffff811339c4>] vfs_fsync+0x17/0x19 [105538.121416] [<ffffffff81133c3c>] do_fsync+0x30/0x55 [105538.121423] [<ffffffff81133e40>] SyS_fsync+0xb/0xf [105538.121432] [<ffffffff8162c2e2>] system_call_fastpath+0x16/0x1b I suspect that this may not be terribly helpful - it probably reflects tasks waiting for a lock rather than whatever is holding it. It was more of a problem when I was trying to delete a snapshot per minute on my ssd, or one every 5 min on hdd. Rich -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
