On Wed, Nov 20, 2019 at 05:36:04PM +0100, Christian Pernegger wrote: > Hello, > > I've decided to go with a snapshot-based backup solution for our new > Linux desktops -- thank you for the timely thread --, namely btrbk. > A couple of subvolumes for different stuff, with hourly snapshots that > regularly go to another machine. Brilliant in theory, less so in > practice, because every time btrbk runs, the box'll freeze for a few > seconds, as in, Firefox and LibreOffice, for instance, become entirely > unresponsive, games hang and so on. (AFAICT, all it does is snapshot > each subvolume and delete ones that are out of the retention period.) Snapshot delete is pretty aggressive with IO and can force a lot of commits if you are modifying a lot of metadata pages between snapshots. Generally I get a coffee when my 1TB NVME systems decide it's time to drop a snapshot, as the system can effectively hang for a few minutes while btrfs-cleaner runs. On performance-critical systems we only ever have one snapshot active on the filesystem at a time, and we only create it once a day for backups. I'd love a way to throttle btrfs-cleaner so it's not so aggressive with IO and CPU. Snapshot create has unbounded running time on 5.0 kernels. The creation process has to flush dirty buffers to the filesystem to get a clean snapshot state. Any process that is writing data while the flush is running gets its data included in the snapshot flush, so in the worst possible case, the snapshot flush never ends (unless you run out of disk space, or whatever was writing new data stops, whichever comes first). Anything that needs to take a sb_writer lock (which is almost everything that modifies the filesystem) will hang until the snapshot create is done; however, processes that are reading the filesystem will not be obstructed. This can lead to starvation of the writing processes. cgroups and ionice won't help here--the block layer doesn't detect waits for sb_writers (there is no associated block device for those, so they're invisible to the block layer), so it doesn't know that writer processes are waiting for IO, and all the writers' IO bandwidth gets reallocated to the reader processes, making for long-lasting priority inversions. The IO pressure stall subsystem reads _zero_ IO pressure even though writing processes are continuously blocked for hours. On small systems, this is all over in a second or less. On bigger fileservers, I've had single snapshot creates run for many hours. As a workaround, I have some scripts that freeze processes that write to the disk while 'btrfs sub create' runs, to force the snapshot create to finish in a timely manner. I think I saw some patches going into later 5.x kernels that solve the problem in the kernel, too (writes that occur after the snapshot creation starts are not included in the snapshot any more). > I'm aware that having many snapshots can impact performance of some > operations, but I didn't think that "many" <= 200, "impact" = stop > dead and "some operations" = light desktop use. These are decently > specced, after all (Zen 2 8/12 core, 32 GB RAM, Samsung 970 Evo Plus). > What I'm asking is, is this to be expected, does it just need tuning, > is the hardware buggy, the kernel version (Ubuntu 18.04.3 HWE, their > 5.0 series) a stinker, something else awry ...? > > Cheers, > C.
Attachment:
signature.asc
Description: PGP signature
