On Mon, Jul 13, 2020 at 09:03:18AM +0800, Qu Wenruo wrote: > This bug is reported by Hans van Kranenburg <hans@xxxxxxxxxxx>, that > when a running btrfs balance get fatal signals (including SIGINT), some > bad things can happen, mostly forced RO caused by -EINTR. > > It turns out that, although we have addressed the btrfs balance cancel > problems, we haven't addressed the signal related problems. > > In theory, processes trapped into kernel space won't get interrupted by > signals, as signal callbacks happen in user space, but kernel code can > still check pending signals and change behavior accordingly. > > In this case, the culprit is that, wait_reserve_ticket() can return > -EINTR if there is a pending fatal signal. > > While for balance, a lot of situations can't handle the -EINTR from it, > especially for critical cleanup phase. > > This patchset will address the bug in two directions: > - Catch fatal signal early > Now btrfs_should_cancel_balance() will also check pending signals. > And will exit gracefully and treat it as a canceled balance. This should be safe as it's checked in known locations. > - Don't allow -EINTR for critical cleanup > For btrfs_drop_snapshot() for reloc trees, we shouldn't be interrupted > by signal, thus we use btrfs_join_transaction() instead of > btrfs_start_transaction(). This one is a bit more scary, but the interruption has been there already so we're not changing anything. I haven't spotted anything obviously wrong so I'll add the patches to misc-next, thanks.
