Re: Rough (re)start with btrfs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, May 2, 2019 at 5:40 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:
>
>
>
> On 2019/5/3 上午3:02, Hendrik Friedel wrote:
> > Hello,
> >
> > thanks for your replies. I appreciate it!
> >>>  I am using btrfs-progs v4.20.2 and debian stretch with
> >>>  4.19.0-0.bpo.2-amd64 (I think, this is the latest Kernel available in
> >>>  stretch. Please correct if I am wrong.
> >>
> >> What scheduler is being used for the drive?
> >>
> >> # cat /sys/block/<dev>/queue/scheduler
> > [mq-deadline] none
> >
> >> If it's none, then kernel version and scheduler aren't likely related
> >> to what you're seeing.
> >>
> >> It's not immediately urgent, but I would still look for something
> >> newer, just because the 4.19 series already has 37 upstream updates
> >> released, each with dozens of fixes, easily there are over 1000 fixes
> >> available in total. I'm not a Debian user but I think there's
> >> stretch-backports that has newer kernels?
> >> http://jensd.be/818/linux/install-a-newer-kernel-in-debian-9-stretch-stable
> >>
> >
> > Unfortunately, backports provides 4.19 as the latest.
> > I am now manually compiling 5.0. Last time I did that, I was less half
> > my current age :-)
> >
> >> We need the entire dmesg so we can see if there are any earlier
> >> complaints by the drive or the link. Can you attach the entire dmesg
> >> as a file?
> > Done (also the two smartctl outputs).
> >
> >>Have you tried stop the workload, and see if the timeout disappears?
> >
> > Unfortunately not. I had the impression that the system did not react
> > anymore. I CTRL-Ced and rebooted.
> > I was copying all the stuff from my old drive to the new one. I should
> > say, that the workload was high, but not exceptional. Just one or two
> > copy jobs.
>
> Then it's some deadlock, not regular high load timeout.
>
> > Also, the btrfs drive was in advantage:
> > 1) it had btrfs ;-) (the other ext4)
> > 2) it did not need to search
> > 3) it was connected via SATA (and not USB3 as the source)
> >
> > The drive does not seem to be an SMR drive (WD80EZAZ).
> >
> >> If it just disappear after some time, then it's the disk too slow and
> >> too heavy load, combined with btrfs' low concurrency design leading to
> >> the problem.
> >
> > I was tempted to ask, whether this should be fixed. On the other hand, I
> > am not even sure anything bad happened (except, well, the system -at
> > least the copy- seemed to hang).
>
> Definitely needs to be fixed.
>
> With full dmesg, it's now clear that is a real dead lock.
> Something wrong with the free space cache, blocking the whole fs to be
> committed.
>
> If you still want to try btrfs, you could try "nosapce_cache" mount option.
> Free space cache of btrfs is just an optimization, you can completely
> ignore that with minor performance drop.


I should have read this before replying earlier.

You can also do a one time clean mount with '-o
clear_cache,space_cache=v2' which will remove the v1 (default) space
cache, and create a v2 cache. Subsequent mount will see the flag for
this feature and always use the v2 cache. It's a totally differently
implementation and shouldn't have this problem.


-- 
Chris Murphy




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux