On Tue, Dec 03, 2019 at 02:42:50PM +0800, Qu Wenruo wrote: > [PROBLEM] > There are quite some users reporting that 'btrfs balance cancel' slow to > cancel current running balance, or even doesn't work for certain dead > balance loop. > > With the following script showing how long it takes to fully stop a > balance: > #!/bin/bash > dev=/dev/test/test > mnt=/mnt/btrfs > > umount $mnt &> /dev/null > umount $dev &> /dev/null > > mkfs.btrfs -f $dev > mount $dev -o nospace_cache $mnt > > dd if=/dev/zero bs=1M of=$mnt/large & > dd_pid=$! > > sleep 3 > kill -KILL $dd_pid > sync > > btrfs balance start --bg --full $mnt & > sleep 1 > > echo "cancel request" >> /dev/kmsg > time btrfs balance cancel $mnt > umount $mnt > > It takes around 7~10s to cancel the running balance in my test > environment. > > [CAUSE] > Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel > request are queued. > However that cancelling request is only checked after relocating a block > group. Yes that's the reason why it takes so long to cancel. Adding more cancellation points is fine, but I don't know what exactly happens when the block group relocation is not finished. There's code to merge the reloc inode and commit that, but that's only a high-level view of the thing.
