Re: btrfs-tools/linux 4.11: btrfs-cleaner misbehaving

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/27/2017 10:29 PM, Ivan P wrote:
> On Sat, May 27, 2017 at 9:33 PM, Hans van Kranenburg
> <hans.van.kranenburg@xxxxxxxxxx> wrote:
>> Hi,
>>
>> On 05/27/2017 08:53 PM, Ivan P wrote:
>>>
>>> for a while now, btrfs-cleaner has been molesting my system's btrfs partition,
>>> as well as my CPU. The behavior is as following:
>>>
>>> After booting, nothing relevant is happening. After about 5-30 minutes,
>>> a btrfs-cleaner process is spawned, which is constantly using one CPU core.
>>> The btrfs-cleaner process never seems to finish (I've let it waste CPU cycles
>>> for 9 hours) and also cannot be stopped or killed.
>>>
>>> Rebooting again usually resolves the issue for some time.
>>> But on next boot, the issue usually reappears.
>>>
>>> I'm running linux 4.11.2, but the issue is also present on current LTS 4.9.29.
>>> I am using newest btrfs-tools, as far as I can tell (4.11). The system is an
>>> arch linux x64 installed on a Transcend 120GB mSATA drive.
>>>
>>> No other disks are present, but the root volume contains several subvolumes
>>> (@arch<date> snapshots, @home, @data).
>>>
>>> The logs don't contain anything related to btrfs, beside the usual diag output
>>> on mounting the root partition.
>>>
>>> I am mounting the btrfs partition with the following options:
>>>
>>> subvol=@arch_current,compress=lzo,ssd,noatime,autodefrag
>>>
>>> What information should I provide so we could debug this?
>>
>> What I usually do first in a similar situation is look at the output of
>>
>>   watch cat /proc/<pid>/stack
>>
>> where <pid> is the pid of the btrfs-cleaner thread.
>>
>> This might already give an idea what kind of things it's doing, by
>> looking at the stack trace. When it's cleaning up a removed subvolume
>> for example, there will be a similar function name in the stack somewhere.
>>
>> --
>> Hans van Kranenburg
> 
> Thank you for the fast reply.
> 
> Most of the time, the stack is just 0xffffffffffffffff, even though
> CPU load is generated.
> These repeat all the time, but addresses stay the same:
> 
> [<ffffffffa0444f19>] get_alloc_profile+0xa9/0x1a0 [btrfs]
> [<ffffffffa04450d2>] can_overcommit+0xc2/0x110 [btrfs]
> [<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
> [<ffffffffa044a21e>] btrfs_free_reserved_data_space_noquota+0x6e/0x100 [btrfs]
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffffa04451ae>] block_rsv_release_bytes+0x8e/0x2b0 [btrfs]
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> So far, these appeared only once or twice:
> 
> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
> [<ffffffff81316d66>] __radix_tree_lookup+0x76/0xf0
> [<ffffffff81316e3d>] radix_tree_lookup+0xd/0x10
> [<ffffffff8118121f>] __do_page_cache_readahead+0x10f/0x2f0
> [<ffffffff81181593>] ondemand_readahead+0x193/0x2c0
> [<ffffffff8118185e>] page_cache_sync_readahead+0x2e/0x50
> [<ffffffffa04a23ab>] btrfs_defrag_file+0x9fb/0xf90 [btrfs]
> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
> [<ffffffff810a04d8>] kthread+0x108/0x140
> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffff81003016>] ___preempt_schedule+0x16/0x18
> [<ffffffffa0487556>] __clear_extent_bit+0x2a6/0x3e0 [btrfs]
> [<ffffffffa0487c57>] clear_extent_bit+0x17/0x20 [btrfs]
> [<ffffffffa04a26fa>] btrfs_defrag_file+0xd4a/0xf90 [btrfs]
> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
> [<ffffffff810a04d8>] kthread+0x108/0x140
> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> [<ffffffff8162efcf>] retint_kernel+0x1b/0x1d
> [<ffffffff810ce28a>] __rcu_read_unlock+0x4a/0x60
> [<ffffffff8118000b>] __set_page_dirty_nobuffers+0xdb/0x170
> [<ffffffffa0468c1e>] btrfs_set_page_dirty+0xe/0x10 [btrfs]
> [<ffffffff8117dd7b>] set_page_dirty+0x5b/0xb0
> [<ffffffffa04a274e>] btrfs_defrag_file+0xd9e/0xf90 [btrfs]
> [<ffffffffa047b66a>] btrfs_run_defrag_inodes+0x25a/0x350 [btrfs]
> [<ffffffffa045cc67>] cleaner_kthread+0x147/0x180 [btrfs]
> [<ffffffff810a04d8>] kthread+0x108/0x140
> [<ffffffff8162e85c>] ret_from_fork+0x2c/0x40
> [<ffffffffffffffff>] 0xffffffffffffffff
> 
> Forgot to mention that I have tried running a scrub, but it neither
> reported any errors nor solved the issue.

defrag actions called from cleaner_kthread. Looks like what Jean-Denis
suggested already.

Does the behaviour change when you disable autodefrag? You can also do
this live with mount -o remount,noautodefrag

Apparently your write pattern is some kind of worst case combined with
autodefrag? I'm not an expert in this area, but probably someone else
knows more.

-- 
Hans van Kranenburg
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux