On Thu, May 2, 2019 at 5:40 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote: > > > > On 2019/5/3 上午3:02, Hendrik Friedel wrote: > > Hello, > > > > thanks for your replies. I appreciate it! > >>> I am using btrfs-progs v4.20.2 and debian stretch with > >>> 4.19.0-0.bpo.2-amd64 (I think, this is the latest Kernel available in > >>> stretch. Please correct if I am wrong. > >> > >> What scheduler is being used for the drive? > >> > >> # cat /sys/block/<dev>/queue/scheduler > > [mq-deadline] none > > > >> If it's none, then kernel version and scheduler aren't likely related > >> to what you're seeing. > >> > >> It's not immediately urgent, but I would still look for something > >> newer, just because the 4.19 series already has 37 upstream updates > >> released, each with dozens of fixes, easily there are over 1000 fixes > >> available in total. I'm not a Debian user but I think there's > >> stretch-backports that has newer kernels? > >> http://jensd.be/818/linux/install-a-newer-kernel-in-debian-9-stretch-stable > >> > > > > Unfortunately, backports provides 4.19 as the latest. > > I am now manually compiling 5.0. Last time I did that, I was less half > > my current age :-) > > > >> We need the entire dmesg so we can see if there are any earlier > >> complaints by the drive or the link. Can you attach the entire dmesg > >> as a file? > > Done (also the two smartctl outputs). > > > >>Have you tried stop the workload, and see if the timeout disappears? > > > > Unfortunately not. I had the impression that the system did not react > > anymore. I CTRL-Ced and rebooted. > > I was copying all the stuff from my old drive to the new one. I should > > say, that the workload was high, but not exceptional. Just one or two > > copy jobs. > > Then it's some deadlock, not regular high load timeout. > > > Also, the btrfs drive was in advantage: > > 1) it had btrfs ;-) (the other ext4) > > 2) it did not need to search > > 3) it was connected via SATA (and not USB3 as the source) > > > > The drive does not seem to be an SMR drive (WD80EZAZ). > > > >> If it just disappear after some time, then it's the disk too slow and > >> too heavy load, combined with btrfs' low concurrency design leading to > >> the problem. > > > > I was tempted to ask, whether this should be fixed. On the other hand, I > > am not even sure anything bad happened (except, well, the system -at > > least the copy- seemed to hang). > > Definitely needs to be fixed. > > With full dmesg, it's now clear that is a real dead lock. > Something wrong with the free space cache, blocking the whole fs to be > committed. > > If you still want to try btrfs, you could try "nosapce_cache" mount option. > Free space cache of btrfs is just an optimization, you can completely > ignore that with minor performance drop. I should have read this before replying earlier. You can also do a one time clean mount with '-o clear_cache,space_cache=v2' which will remove the v1 (default) space cache, and create a v2 cache. Subsequent mount will see the flag for this feature and always use the v2 cache. It's a totally differently implementation and shouldn't have this problem. -- Chris Murphy
