---------- Forwarded message --------- From: Tim Cuthbertson <ratcheer@xxxxxxxxx> Date: Fri, Jun 26, 2020 at 2:30 PM Subject: Re: weekly fstrim (still) necessary? To: Chris Mason <clm@xxxxxx> Well, I am going back to using a weekly, manual fstrim. I have been doing that for many months with no issues. I cannot be certain that discard=async caused the problem. However, I had implemented that for the first time less than two days before I discovered the problem. My system was still booting and seeming to run fine, but then Firefox refused to start. I was looking for the problem and I found csum errors in the systemd journal. Then, I ran btrfs scrub, and found that there were 12,936 csum errors. The systemd journals should still be available, if you'd like me to post them. Tim On Fri, Jun 26, 2020 at 10:40 AM Chris Mason <clm@xxxxxx> wrote: > > On 26 Jun 2020, at 8:08, Tim Cuthbertson wrote: > > > ---------- Forwarded message --------- > > From: Chris Mason <clm@xxxxxx> > > Date: Mon, Jun 22, 2020 at 10:57 AM > > Subject: Re: weekly fstrim (still) necessary? > > To: David Sterba <dsterba@xxxxxxx> > > Cc: Btrfs BTRFS <linux-btrfs@xxxxxxxxxxxxxxx> > > > > > > On 22 Jun 2020, at 10:23, David Sterba wrote: > > > >> On Mon, Jun 22, 2020 at 04:02:34PM +0200, Ulli Horlacher wrote: > >>> On Sun 2020-06-21 (18:57), Chris Murphy wrote: > >>> > >>>>>> You need to check fstrim.timer, which in turn triggers > >>>>>> fstrim.service. > >>>>> > >>>>> root@fex:~# cat /lib/systemd/system/fstrim.timer > >>>>> > >>>>> root@fex:~# cat /lib/systemd/system/fstrim.service > >>> > >>>> I'm familiar with the contents of the files. Do you have a > >>>> question? > >>> > >>> > >>> You have deleted my question, it have asked: > >>> > >>> This means: an extra fstrim (via btrfsmaintenance script, etc) is > >>> unnecessary? > >> > >> You need only one service, either from the fstrim or from > >> btrfsmaintenance. > > > > Dennis’s async discard features are working much better here than > > either periodic trims or the traditional mount -o discard. I’d > > suggest moving to mount -o discard=async instead. > > > > -chris > > > > Apparently, discard=async is still unsafe on Samsung SSDs, at least > > older models. I enabled it on my 850 Pro, and within two days I was > > getting uncorrectable errors (for csums). Scrub showed 12,936 > > uncorrectable errors. > > > > While I was trying to recover, a long SMART analysis showed the actual > > drive to have no errors. > > > > Then, the first recovery attempt failed. I had deleted and recreated > > the partition. When I was copying the backup snapshots back to the > > SSD, uncorrectable errors showed up, again (4,119 of them after > > copying one snapshot). I then overwrote the partition with all zeros, > > and when I copied the snapshots back to it, there were no errors. > > After recovering my filesystem, scrub still showed no errors. So, alls > > well that ends well, I guess. > > We’re using this on a pretty wide variety of hardware, so I’m > surprised to hear this. Are you able to reproduce the problem? Is a > periodic fstrim still happening? > > -chris
