Re: fstrim is takes a long time on Btrfs and NVMe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Dec 22, 2019 at 3:29 PM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>
>
>
> On 23.12.19 г. 0:11 ч., Chris Murphy wrote:
> > On Sun, Dec 22, 2019 at 12:15 PM Roman Mamedov <rm@xxxxxxxxxxx> wrote:
> >>
> >> On Sun, 22 Dec 2019 20:06:57 +0200
> >> Nikolay Borisov <nborisov@xxxxxxxx> wrote:
> >>
> >>> Well, if we rework how fitrim is implemented - e.g. make discards async
> >>> and have some sort of locking to exclude queued extents being allocated
> >>> we can alleviate the problem somewhat.
> >>
> >> Please keep fstrim synchronous, in many cases TRIM is expected to be completed
> >> as it returns, for the next step of making a snapshot of a thin LV for backup,
> >> to shutdown a VM for migration, and so on.
> >
> > XFS already does async discards. What's the effect of FIFREEZE on
> > discards? An LV snapshot freezes the file system on the LV just prior
> > to the snapshot.
>
> Actually, XFS issues synchronous discards for the FITRIM ioctl i.e
> xfs_trim_extents calls blkdev_issue_discard same as with BTRFS. And
> Dennis' patches implement async runtime discards (which is what XFS is
> using by default).
>
> >
> >> I don't think many really care about how long fstrim takes, it's not a typical
> >> interactive end-user task.
> >
> > I only care if I notice it affecting user space (excepting my timed
> > use of fstrim for testing).
> >
> > Speculation: If a scheduled fstrim can block startup, that's not OK. I
> > don't have enough data to know if it's possible, let alone likely. But
> > when fstrim takes a minute to discard the unused blocks in only 51GiB
> > of used block groups (likely highly fragmented free space), and only a
> > fraction of a second to discard the unused block *groups*, I'm
> > suspicious startup delays may be possible.
>
> If it takes that long then it's the drive's implementaiton at fault.
> Whatever we do in software we will only masking the latency, which might
> be workable solution for some but not for others.

The point of bringing it up is to drive home the point we don't even
understand the scope of the problem, especially if this behavior is
surprising. It's common hardware.

fstrim on this file system results in 53618 discards to be issued. 35
of these are + 8388607 in size, which I think translates to ~4G but
I'm not sure if these are 4K block size or 512 byte. Seems more
consistent with them being 512 bytes, but it doesn't come out to
exactly 4GiB if I assume that.

Those 35 large discard ranges takes only 0.016951402 seconds. The
remaining 53000+ discards take the overwhelming bulk of time, over a
minute. I have no idea if this delay is lookup/computation for Btrfs
to figure out what the unused blocks are. Or if it's a device delay.
And that comes to about 568 discards per second. That's really
unreasonable drive performance behavior?

Also...

259,0    3   127259    91.202441239  3057  A  DS 177594367 + 1 <-
(259,7) 54700031

A single 512 byte discard? That's suspicious. Btrfs doesn't work in
512 byte increments, minimum unit is 4k for (Btrfs) sector size, isn't
it?



-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux