Don’t get me wrong, the single 100% CPU is only during balance process. By running "btrfs device delete missing /storage”there is no impact on CPU/RAM. I do have 64GB DDR4 ECC but there is no more of 3GB ram usage. I can see that @Chris Murphy mention that disabling the cache will impact performance. Did you tried that? On my devices I do have cache enabled and till now this is the only thing that I didn't tried :) # hdparm -W /dev/sdc /dev/sdc: write-caching = 1 (on) > On May 1, 2020, at 07:48, Phil Karn <karn@xxxxxxxx> wrote: > > On 4/30/20 19:47, Zygo Blaxell wrote: >> >> If it keeps repeating "found 1115 extents" over and over (say 5 or >> more times) then you're hitting the balance looping bug in kernel 5.1 >> and later. Every N block groups (N seems to vary by user, I've heard >> reports from 3 to over 6000) the kernel will get stuck in a loop and >> will need to reboot to recover. Even if you cancel the balance, it will >> just loop again until rebooted, and there's no cancel for device delete >> so if you start looping there you can just skip directly to the reboot. >> For a non-trivial filesystem the probability of successfully deleting >> or resizing a device is more or less zero. > > This does not seem to be happening. Each message is for a different > block group with a different number of clusters. The device remove *is* > making progress, just very very slowly. I'm almost down to just 2TB > left. Woot! > > If I ever have to do this again, I'll insert bcache and a big SSD > between btrfs and my devices. The slowness here has to be due to the > (spinning) disk I/O being highly fragmented and random. I've checked, > and none of my drives (despite their large sizes) are shingled, so > that's not it. The 6 TB units have 128 MB caches and the 16 TB have 256 > MB caches. > > I've never understood *exactly* what a hard drive internal cache does. I > see little sense in a LRU cache just like the host's own buffer cache > since the host has far more RAM. I do know they're used to reorder > operations to reduce seek latency, though this can be limited by the > need to fence writes to protect against a crash. I've wondered if > they're also used on reads to reduce rotational latency by prospectively > grabbing data as soon as the heads land on a cylinder. How big is a > "cylinder'' anyway? The inner workings of hard drives have become > steadily more opaque over the years, which makes it difficult to > optimize their use. Kinda like CPUs, actually. Last time I really tuned > up some tight code, I found that using vector instructions and avoiding > branch mispredictions made a big difference but nothing else seemed to > matter at all. > >> >> There is no fix for that regression yet. Kernel 4.19 doesn't have the >> regression and does have other relevant bug fixes for balance, so it >> can be used as a workaround. > > I'm running 4.19.0-8-rt-amd64, the current real-time kernel in Debian > 'stable'. > > Phil > > >
