On Sat, Dec 28, 2019 at 06:04:21PM +0100, Leszek Dubiel wrote: > > PROBLEM SOLVED --- btrfs was busy cleaing after snaphot deletion few days > ago, so it dodn't have time to "dev delete", that's why it was slow That checks out. Snapshot delete and remove-device/resize/balance are not able to run at the same time. There is a mutex, so one or the other will run while the other is blocked. > ======================= > > > I restarted server, so job "btrfs delete" was not existent any more. > But disks were still working (!!!). I wondered why? What is BTRFS doing all > the time? > > I realized that afew days before starting "btrfs dev delete" I have removed > many snapshots -- there were about 400 snapshots and I left 20 only. I did > that because I have read that many snapshot could slowdown btrfs operations > severely. > > > > I made an experiment on another testing serwer: > > 1. started command "watch -n1 'btrfs fi df /" > 2. started "iostat -x -m" > > Disks were not working at all. > > > 3. Then I removed many shapshots on that testing server > > and I was watching: > > Data, single: total=6.56TiB, used=5.13TiB > System, RAID1: total=32.00MiB, used=992.00KiB > Metadata, RAID1: total=92.00GiB, used=70.56GiB > GlobalReserve, single: total=512.00MiB, used=1.39MiB > > Disks started to work hard. So btrfs was probably cleaining after snapshot > deletion. > > At the beginning in "Metadata" line there was "used=70.00GiB". > > Metadata, RAID1: total=92.00GiB, used=70.00GiB > > It was changing all the time... getting lower and lower. During that process > in line > > GlobalReserve, single: total=512.00MiB, used=1.39MiB > > "used=" was going up until it reached about 100MiB, then it was flushed to > zero, and started again to fill, flush, fill, flush... some > loop/buffer/cache (?). > It convinced me, that after snapshot deletion btrfs is really working hard > on cleanup. > After some time "Metadata...used=" stopped changing, disks stopped working, > server got lazy again. > > > > I got back to main server. Started to watch "Metadata...used=". It was going > down and down... > I waited. When "Metadata...used=" stopped changing, then "iostat -m" stopped > showing any disk activity. > > I started "btrfs dev delete" again and now speed is 50Mb/sek. Hurrray! > Problem solved. > > > Sorry for not beeing clever enough to connect all this facts at the > beginning. > Anyway -- maybe in the future someone has the same problem, then btrfs > experts could ask him if he let btrfs do some other hard work in the same > time (eg cleaning up after massive snapsot deletion). > > Maybe it would be usful to have a tool to ask btrfs "what are you doing? are > you busy?". > It would respond "I am cleaing up after snapshot deletion... I am > balancing... I am scrubbing... I am resizing... I am deleting ...". Usually 'top' or 'iotop' suffices for that. btrfs-cleaner = deleting snapshots, other activities will be tied to their respective userspace agents. The balance/delete/resize/drop-snapshot mutex is the only special case that I know of where one btrfs maintenance thread waits for another. It might be handy to give users a clue on snapshot delete, like add "use btrfs sub list -d to monitor deletion progress, or btrfs sub sync to wait for deletion to finish". > > >
Attachment:
signature.asc
Description: PGP signature
