On Thu, Nov 22, 2018 at 6:07 AM Tomasz Chmielewski <tch@xxxxxxxxxxx> wrote:
>
> On 2018-11-22 21:46, Nikolay Borisov wrote:
>
> >> # echo w > /proc/sysrq-trigger
> >>
> >> # dmesg -c
> >> [ 931.585611] sysrq: SysRq : Show Blocked State
> >> [ 931.585715] task PC stack pid father
> >> [ 931.590168] btrfs-cleaner D 0 1340 2 0x80000000
> >> [ 931.590175] Call Trace:
> >> [ 931.590190] __schedule+0x29e/0x840
> >> [ 931.590195] schedule+0x2c/0x80
> >> [ 931.590199] schedule_timeout+0x258/0x360
> >> [ 931.590204] io_schedule_timeout+0x1e/0x50
> >> [ 931.590208] wait_for_completion_io+0xb7/0x140
> >> [ 931.590214] ? wake_up_q+0x80/0x80
> >> [ 931.590219] submit_bio_wait+0x61/0x90
> >> [ 931.590225] blkdev_issue_discard+0x7a/0xd0
> >> [ 931.590266] btrfs_issue_discard+0x123/0x160 [btrfs]
> >> [ 931.590299] btrfs_discard_extent+0xd8/0x160 [btrfs]
> >> [ 931.590335] btrfs_finish_extent_commit+0xe2/0x240 [btrfs]
> >> [ 931.590382] btrfs_commit_transaction+0x573/0x840 [btrfs]
> >> [ 931.590415] ? btrfs_block_rsv_check+0x25/0x70 [btrfs]
> >> [ 931.590456] __btrfs_end_transaction+0x2be/0x2d0 [btrfs]
> >> [ 931.590493] btrfs_end_transaction_throttle+0x13/0x20 [btrfs]
> >> [ 931.590530] btrfs_drop_snapshot+0x489/0x800 [btrfs]
> >> [ 931.590567] btrfs_clean_one_deleted_snapshot+0xbb/0xf0 [btrfs]
> >> [ 931.590607] cleaner_kthread+0x136/0x160 [btrfs]
> >> [ 931.590612] kthread+0x120/0x140
> >> [ 931.590646] ? btree_submit_bio_start+0x20/0x20 [btrfs]
> >> [ 931.590658] ? kthread_bind+0x40/0x40
> >> [ 931.590661] ret_from_fork+0x22/0x40
> >>
> >
> > It seems your filesystem is mounted with the DSICARD option meaning
> > every delete will result in discard this is highly suboptimal for
> > ssd's.
> > Try remounting the fs without the discard option see if it helps.
> > Generally for discard you want to submit it in big batches (what fstrim
> > does) so that the ftl on the ssd could apply any optimisations it might
> > have up its sleeve.
>
> Spot on!
>
> Removed "discard" from fstab and added "ssd", rebooted - no more
> btrfs-cleaner running.
>
> Do you know if the issue you described ("discard this is highly
> suboptimal for ssd") affects other filesystems as well to a similar
> extent? I.e. if using ext4 on ssd?
Quite a lot of activity on ext4 and XFS are overwrites, so discard
isn't needed. And it might be discard is subject to delays. On Btrfs,
it's almost immediate, to the degree that on a couple SSDs I've
tested, stale trees referenced exclusively by the most recent backup
tree entires in the superblock are already zeros. That functionally
means no automatic recoveries at mount time if there's a problem with
any of the current trees.
I was using it for about a year to no ill effect, BUT not a lot of
file deletions either. I wouldn't recommend it, and instead suggest
enabling the fstrim.timer which by default runs fstrim.service once a
week (which in turn issues fstrim, I think on all mounted volumes.)
I am a bit more concerned about the read errors you had that were
being corrected automatically? The corruption suggests a firmware bug
related to trim. I'd check the affected SSD firmware revision and
consider updating it (only after a backup, it's plausible the firmware
update is not guaranteed to be data safe). Does the volume use DUP or
raid1 metadata? I'm not sure how it's correcting for these problems
otherwise.
--
Chris Murphy