This bug is reported by Hans van Kranenburg <hans@xxxxxxxxxxx>, that
when a running btrfs balance get fatal signals (including SIGINT), some
bad things can happen, mostly forced RO caused by -EINTR.
It turns out that, although we have addressed the btrfs balance cancel
problems, we haven't addressed the signal related problems.
In theory, processes trapped into kernel space won't get interrupted by
signals, as signal callbacks happen in user space, but kernel code can
still check pending signals and change behavior accordingly.
In this case, the culprit is that, wait_reserve_ticket() can return
-EINTR if there is a pending fatal signal.
While for balance, a lot of situations can't handle the -EINTR from it,
especially for critical cleanup phase.
This patchset will address the bug in two directions:
- Catch fatal signal early
Now btrfs_should_cancel_balance() will also check pending signals.
And will exit gracefully and treat it as a canceled balance.
- Don't allow -EINTR for critical cleanup
For btrfs_drop_snapshot() for reloc trees, we shouldn't be interrupted
by signal, thus we use btrfs_join_transaction() instead of
btrfs_start_transaction().
And for other critical call sites, change the flushing level to avoid
signal interruption.
We also enhance the comment for the btrfs_reserve_flush_enum, to make
it easier to grasp.
Changelog:
v1:
- Change the callers of ticketing system
Still allow certain tickets to be interrupted by signals, and change
the call sites to avoid signal interruption.
v2:
- Add comment for why we can reduce the meta rsv for tree swap
v3:
- Add back the not-yet-merged first patch
- Rephrase the commit message of the 2nd patch
- Add comment for the 3rd patch about the canceled balance return value
- Add a new patch explaining the flushing level
Qu Wenruo (4):
btrfs: relocation: allow signal to cancel balance
btrfs: avoid possible signal interruption for btrfs_drop_snapshot() on
relocation tree
btrfs: relocation: review the call sites which can be interrupted by
signal
btrfs: Add comments for btrfs_reserve_flush_enum
fs/btrfs/ctree.h | 34 ++++++++++++++++++++++++++++++++--
fs/btrfs/extent-tree.c | 5 ++++-
fs/btrfs/relocation.c | 16 +++++++++++++---
fs/btrfs/volumes.c | 14 +++++++++++++-
4 files changed, 62 insertions(+), 7 deletions(-)
--
2.27.0