[PROBLEM]
There are quite some users reporting that 'btrfs balance cancel' slow to
cancel current running balance, or even doesn't work for certain dead
balance loop.
With the following script showing how long it takes to fully stop a
balance:
#!/bin/bash
dev=/dev/test/test
mnt=/mnt/btrfs
umount $mnt &> /dev/null
umount $dev &> /dev/null
mkfs.btrfs -f $dev
mount $dev -o nospace_cache $mnt
dd if=/dev/zero bs=1M of=$mnt/large &
dd_pid=$!
sleep 3
kill -KILL $dd_pid
sync
btrfs balance start --bg --full $mnt &
sleep 1
echo "cancel request" >> /dev/kmsg
time btrfs balance cancel $mnt
umount $mnt
It takes around 7~10s to cancel the running balance in my test
environment.
[CAUSE]
Btrfs uses btrfs_fs_info::balance_cancel_req to record how many cancel
request are queued.
However that cancelling request is only checked after relocating a block
group.
That behavior is far from optimal to provide a faster cancelling.
[FIX]
This patchset will add more cancelling check points, to make cancelling
faster.
And also, introduce a new error injection points to cover these newly
introduced and future check points.
Qu Wenruo (4):
btrfs: relocation: Introduce error injection points for cancelling
balance
btrfs: relocation: Check cancel request after each data page read
btrfs: relocation: Check cancel request after each extent found
btrfs: relocation: Work around dead relocation stage loop
fs/btrfs/ctree.h | 1 +
fs/btrfs/relocation.c | 23 +++++++++++++++++++++++
fs/btrfs/volumes.c | 2 +-
3 files changed, 25 insertions(+), 1 deletion(-)
--
2.24.0