extent-same ioctl hangs holding locks (v3.18.8)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



The extent-same ioctl seems to have a locking bug.  My test machines
will run between 0 and 3 days before something gets locked and stays
locked forever.

In the dumps and logs below, 'btrsame' calls the btrfs extent-same ioctl
on its arguments.  As you can see from the trace, it is stuck in this
very ioctl.

'rsync' is famous already and needs no introduction.  It seems
to be stuck just trying to read the file extent-same is reading.

All of these processes are stuck on the same file.  There are lots of
rsyncs because the sender keeps timing out and starting a new rsync,
and the new rsync promptly locks up as soon as it touches the same file
the old ones did.

The rsync processes (and also 'cat' from the command line just now) lock
up when they try to read the same byte range currently being processed by
extent-same.  They are able to read earlier bytes in the file just fine.

Here is /proc/version:

	Linux version 3.18.8-zb64+ (root@sneezy) (gcc version 4.9.1 (Debian 4.9.1-19) ) #1 SMP PREEMPT Fri Feb 27 01:09:08 EST 2015

Here are the kernel stacks from currently stuck processes:

	USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
	root       583  0.5  0.2 193664 38812 ?        DNs  02:36   2:36 rsync --server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded --partial --numeric-ids --fuzzy --
	[<ffffffff81181412>] __lock_page_killable+0xa2/0xb0
	[<ffffffff81182f30>] generic_file_read_iter+0x440/0x600
	[<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
	[<ffffffff811fc394>] vfs_read+0x94/0x180
	[<ffffffff811fcf46>] SyS_read+0x46/0xc0
	[<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
	[<ffffffffffffffff>] 0xffffffffffffffff

	USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
	root      4241  1.3  0.2 148064 36516 ?        DNs  07:11   2:27 rsync --server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded --partial --numeric-ids --fuzzy --
	[<ffffffff81181412>] __lock_page_killable+0xa2/0xb0
	[<ffffffff81182f30>] generic_file_read_iter+0x440/0x600
	[<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
	[<ffffffff811fc394>] vfs_read+0x94/0x180
	[<ffffffff811fcf46>] SyS_read+0x46/0xc0
	[<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
	[<ffffffffffffffff>] 0xffffffffffffffff

	USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
	root     10091  0.3  0.2 184208 40416 ?        DNs  Mar02   2:25 rsync --server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded --partial --numeric-ids --fuzzy --
	[<ffffffff81392d9d>] lock_extent_bits+0x1ad/0x200
	[<ffffffff81392e03>] lock_extent+0x13/0x20
	[<ffffffff81395d3c>] __extent_readpages.constprop.38+0x21c/0x2c0
	[<ffffffff81397109>] extent_readpages+0x179/0x1c0
	[<ffffffff813777bf>] btrfs_readpages+0x1f/0x30
	[<ffffffff8118f606>] __do_page_cache_readahead+0x1b6/0x250
	[<ffffffff8118f77f>] ondemand_readahead+0xdf/0x280
	[<ffffffff8118fa4e>] page_cache_sync_readahead+0x2e/0x50
	[<ffffffff81182fdc>] generic_file_read_iter+0x4ec/0x600
	[<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
	[<ffffffff811fc394>] vfs_read+0x94/0x180
	[<ffffffff811fcf46>] SyS_read+0x46/0xc0
	[<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
	[<ffffffffffffffff>] 0xffffffffffffffff

	USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
	root     19705  0.8  0.2 192260 34888 ?        DNs  05:11   2:29 rsync --server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded --partial --numeric-ids --fuzzy --
	[<ffffffff81181412>] __lock_page_killable+0xa2/0xb0
	[<ffffffff81182f30>] generic_file_read_iter+0x440/0x600
	[<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
	[<ffffffff811fc394>] vfs_read+0x94/0x180
	[<ffffffff811fcf46>] SyS_read+0x46/0xc0
	[<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
	[<ffffffffffffffff>] 0xffffffffffffffff

	USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
	root     21950  0.1  0.0  14800  1452 pts/43   D<+  01:17   0:44 btrsame [paths redacted]
	[<ffffffff81181362>] __lock_page+0xa2/0xb0
	[<ffffffff81181dc8>] pagecache_get_page+0x168/0x1a0
	[<ffffffff813a805d>] extent_same_get_page+0x2d/0xc0
	[<ffffffff813aef8f>] btrfs_ioctl+0x26bf/0x28c0
	[<ffffffff8120fa10>] do_vfs_ioctl+0x2f0/0x4f0
	[<ffffffff8120fc91>] SyS_ioctl+0x81/0xa0
	[<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
	[<ffffffffffffffff>] 0xffffffffffffffff

	USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
	root     22535 13.4  0.1  75392 16716 ?        DNs  10:02   1:41 rsync --server -vlHogDtprSze.iLsf --timeout=3600 --delete-during --delete-excluded --partial --numeric-ids --fuzzy --
	[<ffffffffffffffff>] 0xffffffffffffffff

Here are the kernel messages:

	Mar  3 04:29:29 sneezy kernel: [113471.296017] INFO: task rsync:10091 blocked for more than 1800 seconds.
	Mar  3 04:29:29 sneezy kernel: [113471.296067]       Tainted: G        W      3.18.8-zb64+ #1
	Mar  3 04:29:29 sneezy kernel: [113471.296109] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	Mar  3 04:29:29 sneezy kernel: [113471.296184] rsync           D 0000000000000001     0 10091      1 0x00000004
	Mar  3 04:29:29 sneezy kernel: [113471.296193]  ffff8800088c79a8 0000000000000086 ffff880069696000 000000000001fc00
	Mar  3 04:29:29 sneezy kernel: [113471.296200]  ffff8800088c7fd8 000000000001fc00 ffff88040a648000 ffff880069696000
	Mar  3 04:29:29 sneezy kernel: [113471.296207]  ffffffff81391f2c ffff880131b509d0 ffff8800088c7908 ffffffff8139152a
	Mar  3 04:29:29 sneezy kernel: [113471.296214] Call Trace:
	Mar  3 04:29:29 sneezy kernel: [113471.296230]  [<ffffffff81391f2c>] ? __set_extent_bit+0x44c/0x540
	Mar  3 04:29:29 sneezy kernel: [113471.296238]  [<ffffffff8139152a>] ? free_extent_state+0x4a/0xd0
	Mar  3 04:29:29 sneezy kernel: [113471.296245]  [<ffffffff81391f2c>] ? __set_extent_bit+0x44c/0x540
	Mar  3 04:29:29 sneezy kernel: [113471.296252]  [<ffffffff81871ea8>] ? retint_restore_args+0x13/0x13
	Mar  3 04:29:29 sneezy kernel: [113471.296261]  [<ffffffff8186ca79>] schedule+0x29/0x70
	Mar  3 04:29:29 sneezy kernel: [113471.296266]  [<ffffffff81392d9d>] lock_extent_bits+0x1ad/0x200
	Mar  3 04:29:29 sneezy kernel: [113471.296274]  [<ffffffff810b78c0>] ? __wake_up_sync+0x20/0x20
	Mar  3 04:29:29 sneezy kernel: [113471.296280]  [<ffffffff81392e03>] lock_extent+0x13/0x20
	Mar  3 04:29:29 sneezy kernel: [113471.296286]  [<ffffffff81395d3c>] __extent_readpages.constprop.38+0x21c/0x2c0
	Mar  3 04:29:29 sneezy kernel: [113471.296293]  [<ffffffff81379130>] ? btrfs_direct_IO+0x360/0x360
	Mar  3 04:29:29 sneezy kernel: [113471.296299]  [<ffffffff81397109>] extent_readpages+0x179/0x1c0
	Mar  3 04:29:29 sneezy kernel: [113471.296304]  [<ffffffff81379130>] ? btrfs_direct_IO+0x360/0x360
	Mar  3 04:29:29 sneezy kernel: [113471.296314]  [<ffffffff813777bf>] btrfs_readpages+0x1f/0x30
	Mar  3 04:29:29 sneezy kernel: [113471.296321]  [<ffffffff8118f606>] __do_page_cache_readahead+0x1b6/0x250
	Mar  3 04:29:29 sneezy kernel: [113471.296329]  [<ffffffff8118f77f>] ondemand_readahead+0xdf/0x280
	Mar  3 04:29:29 sneezy kernel: [113471.296336]  [<ffffffff8118fa4e>] page_cache_sync_readahead+0x2e/0x50
	Mar  3 04:29:29 sneezy kernel: [113471.296342]  [<ffffffff81182fdc>] generic_file_read_iter+0x4ec/0x600
	Mar  3 04:29:29 sneezy kernel: [113471.296350]  [<ffffffff810136cb>] ? __switch_to+0x1fb/0x620
	Mar  3 04:29:29 sneezy kernel: [113471.296358]  [<ffffffff811fbbfe>] new_sync_read+0x7e/0xb0
	Mar  3 04:29:29 sneezy kernel: [113471.296367]  [<ffffffff811fc394>] vfs_read+0x94/0x180
	Mar  3 04:29:29 sneezy kernel: [113471.296373]  [<ffffffff811fcf46>] SyS_read+0x46/0xc0
	Mar  3 04:29:29 sneezy kernel: [113471.296379]  [<ffffffff818711ad>] system_call_fastpath+0x16/0x1b
	Mar  3 04:29:29 sneezy kernel: [113471.296386] INFO: task btrsame:21950 blocked for more than 1800 seconds.
	Mar  3 04:29:29 sneezy kernel: [113471.296428]       Tainted: G        W      3.18.8-zb64+ #1
	Mar  3 04:29:29 sneezy kernel: [113471.296469] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
	Mar  3 04:29:29 sneezy kernel: [113471.296542] btrsame         D 0000000000000000     0 21950  30469 0x00000000
	Mar  3 04:29:29 sneezy kernel: [113471.296549]  ffff880003a9bc18 0000000000000086 ffff880003ff6000 000000000001fc00
	Mar  3 04:29:29 sneezy kernel: [113471.296556]  ffff880003a9bfd8 000000000001fc00 ffff88005c172000 ffff880003ff6000
	Mar  3 04:29:29 sneezy kernel: [113471.296563]  ffff8801d11de000 ffff88009cd5d000 ffff880003a9bc68 ffffffff8186c432
	Mar  3 04:29:29 sneezy kernel: [113471.296570] Call Trace:
	Mar  3 04:29:29 sneezy kernel: [113471.296578]  [<ffffffff8186c432>] ? __schedule+0x442/0xa60
	Mar  3 04:29:29 sneezy kernel: [113471.296586]  [<ffffffff8186d470>] ? bit_wait+0x60/0x60
	Mar  3 04:29:29 sneezy kernel: [113471.296592]  [<ffffffff8101e999>] ? read_tsc+0x9/0x10
	Mar  3 04:29:29 sneezy kernel: [113471.296599]  [<ffffffff8186d470>] ? bit_wait+0x60/0x60
	Mar  3 04:29:29 sneezy kernel: [113471.296604]  [<ffffffff8186ca79>] schedule+0x29/0x70
	Mar  3 04:29:29 sneezy kernel: [113471.296610]  [<ffffffff8186cd98>] io_schedule+0x98/0x100
	Mar  3 04:29:29 sneezy kernel: [113471.296616]  [<ffffffff8186d4a4>] bit_wait_io+0x34/0x60
	Mar  3 04:29:29 sneezy kernel: [113471.296622]  [<ffffffff8186d32b>] __wait_on_bit_lock+0x4b/0xb0
	Mar  3 04:29:29 sneezy kernel: [113471.296628]  [<ffffffff811361a0>] ? __delayacct_blkio_end+0x60/0xa0
	Mar  3 04:29:29 sneezy kernel: [113471.296635]  [<ffffffff81181362>] __lock_page+0xa2/0xb0
	Mar  3 04:29:29 sneezy kernel: [113471.296642]  [<ffffffff810b7900>] ? autoremove_wake_function+0x40/0x40
	Mar  3 04:29:29 sneezy kernel: [113471.296648]  [<ffffffff81181dc8>] pagecache_get_page+0x168/0x1a0
	Mar  3 04:29:29 sneezy kernel: [113471.296656]  [<ffffffff813a805d>] extent_same_get_page+0x2d/0xc0
	Mar  3 04:29:29 sneezy kernel: [113471.296663]  [<ffffffff813aef8f>] btrfs_ioctl+0x26bf/0x28c0
	Mar  3 04:29:29 sneezy kernel: [113471.296669]  [<ffffffff8147321f>] ? queue_unplugged+0x3f/0xc0
	Mar  3 04:29:29 sneezy kernel: [113471.296678]  [<ffffffff81476578>] ? blk_finish_plug+0x18/0x60
	Mar  3 04:29:29 sneezy kernel: [113471.296684]  [<ffffffff8118f618>] ? __do_page_cache_readahead+0x1c8/0x250
	Mar  3 04:29:29 sneezy kernel: [113471.296692]  [<ffffffff8120fa10>] do_vfs_ioctl+0x2f0/0x4f0
	Mar  3 04:29:29 sneezy kernel: [113471.296699]  [<ffffffff814b35ba>] ? trace_hardirqs_off_thunk+0x3a/0x3c
	Mar  3 04:29:29 sneezy kernel: [113471.296705]  [<ffffffff8120fc91>] SyS_ioctl+0x81/0xa0
	Mar  3 04:29:29 sneezy kernel: [113471.296711]  [<ffffffff818711ad>] system_call_fastpath+0x16/0x1b

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux