Re: fstrim is takes a long time on Btrfs and NVMe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Dec 21, 2019 at 2:27 AM Nikolay Borisov <nborisov@xxxxxxxx> wrote:
>
>
>
> On 21.12.19 г. 8:24 ч., Chris Murphy wrote:
> > Hi,
> >
> > Recent kernels, I think since 5.1 or 5.2, but tested today on 5.3.18,
> > 5.4.5, 5.5.0rc2, takes quite a long time for `fstrim /` to complete,
> > just over 1 minute.
> >
> > Filesystem      Size  Used Avail Use% Mounted on
> > /dev/nvme0n1p7  178G   16G  161G   9% /
> >
> > fstrim stops on this for pretty much the entire time:
> > ioctl(3, FITRIM, {start=0, len=0xffffffffffffffff, minlen=0}) = 0
> >
> > top shows the fstrim process itself isn't consuming much CPU, about
> > 2-3%. Top five items in per top, not much more revealing.
> >
> > Samples: 220K of event 'cycles', 4000 Hz, Event count (approx.):
> > 3463316966 lost: 0/0 drop: 0/0
> > Overhead  Shared Object                    Symbol
> >    1.62%  [kernel]                         [k] find_next_zero_bit
> >    1.59%  perf                             [.] 0x00000000002ae063
> >    1.52%  [kernel]                         [k] psi_task_change
> >    1.41%  [kernel]                         [k] update_blocked_averages
> >    1.33%  [unknown]                        [.] 0000000000000000
> >
> > On a different system, with older Samsung 840 SATA SSD, and a fresh
> > Btrfs, I can't reproduce. It takes less than 1s. Not sure how to get
> > more information.
>
>
> trim implementations are a blackbox and specific to particular hardware.
> Can you try with a different filesystem on the same drive? When
> implementing the fstrim ioctl there isn't much you can do since discard
> requests are just sent to the disk.
>
> Providing blkttraces might yield more insight as to where the requests
> spend most time.

Roughly 90% of each CPUs file looks like they're very small block
discards, if I'm reading this correctly at all...

259,0    3   117655    85.094469086  3057  A  DS 233804904 + 688 <-
(259,7) 110910568

Quite a lot are + 32 and +64. Only after 85% through the parsed file
do I see values

259,0    3   127448    91.214170783  3057  A   D 473292774 + 8388607
<- (259,7) 350398438

The bulk of the space is unallocated, which I'm guessing are the large
block discards. And as I think about it, back when fstrim was fast on
this same hardware, the amount discarded exactly matched only
unallocated space, as if unused space in block groups was not
discarded. So this slowness might be related to finding all of those
free space blocks. Further, I'm using space_cache=v2. And further, all
the tests I do on new file systems doesn't show this probably because
they aren't aged like this one is.


    Device size:         178.00GiB
    Device allocated:          52.04GiB
    Device unallocated:         125.96GiB
    Device missing:             0.00B
    Used:              15.15GiB
    Free (estimated):         160.36GiB    (min: 160.36GiB)


-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux