Re: [PATCH RFC 2/2] btrfs: space-info: Don't allow signal to interrupt ticket waiting

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 7/6/20 10:05 AM, Qu Wenruo wrote:


On 2020/7/6 下午9:53, Josef Bacik wrote:
On 7/6/20 9:50 AM, Qu Wenruo wrote:


On 2020/7/6 下午9:45, Josef Bacik wrote:
On 7/6/20 3:44 AM, Qu Wenruo wrote:
[BUG]
When balance receive a fatal signal, it can make the fs to read-only
mode if the timing is unlucky enough:

     BTRFS info (device xvdb): balance: start -d -m -s
     BTRFS info (device xvdb): relocating block group 73001861120 flags
metadata
     BTRFS info (device xvdb): found 12236 extents, stage: move data
extents
     BTRFS info (device xvdb): relocating block group 71928119296 flags
data
     BTRFS info (device xvdb): found 3 extents, stage: move data extents
     BTRFS info (device xvdb): found 3 extents, stage: update data
pointers
     BTRFS info (device xvdb): relocating block group 60922265600 flags
metadata
     BTRFS: error (device xvdb) in btrfs_drop_snapshot:5505: errno=-4
unknown
     BTRFS info (device xvdb): forced readonly
     BTRFS info (device xvdb): balance: ended with status: -4

[CAUSE]
This is caused by the fact that btrfs ticketing space system can be
interrupted, and cause all kind of -EINTR returned to various critical
section, where we never thought of -EINTR at all.

Even for things like btrfs_start_transaction() can be affected by
signal:
    btrfs_start_transaction()
    |- start_transaction(flush = FLUSH_ALL)
       |- btrfs_block_rsv_add()
          |- btrfs_reserve_metadata_bytes()
             |- __reserve_metadata_bytes()
                |- handle_reserve_ticket()
                   |- wait_reserve_ticket()
                      |- prepare_to_wait_event(TASK_KILLABLE)
                      |- ticket->error = -EINTR;

And all related callers get -EINTR error.

In fact, there are really very limited call sites can really handle
that
-EINTR properly, above btrfs_drop_snapshot() is one case.

[FIX]
Things like metadata allocation is really a critical section for btrfs,
we don't really want it to be that killable by some impatient users.

In fact, for really long duration calls, it should have their own
checks
on signal, like balance, reflink, generic fiemap calls.

So this patch will make ticket waiting uninterruptible, relying on each
long duration calls to handle their signals more properly.


Nope, everybody that calls start_transaction() should be able to handle
-EINTR, so we need to find those callsites and fix them, not make it so
we hang the box because we don't like fixing error paths.  Thanks,

Then we also need to handle btrfs_delalloc_reserve_metadata(),
btrfs_block_rsv_refill(), btrfs_use_block_rsv(), btrfs_block_rsv_add().


This only needs to be handled for FLUSH_ALL and FLUSH_STEAL.  Anybody
doing btrfs_start_transaction() should be able to fail with -EINTR,
because they should be close to the syscall path.  Balance needs to be
fixed to deal with it, and I assume there might be a few other places,
but by-in-large none of these places should flip read only.  Thanks,

There are already ones existing, for btrfs_drop_snapshot().

This is mostly caused by btrfs_delalloc_reserve_metadata(), which always
use FLUSH_ALL unless there is a running trans or its space cache inode.

But still, for a lot of relocation code, we don't really want to bother
the EINTR and just want uninterruptible at least for now.

Any idea for that?
Or just rework how we handle errors in a lot of places?

It doesn't look correct for such a low level mechanism to return -EINTR
while most of callers doesn't really want to bother.


That's the point, most callers shouldn't have to, because most callers shouldn't be far enough into their operations that -EINTR causes problems.

We should probably just change btrfs_drop_snapshot to use join, and then have it do any space reservation it needs outside of the trans handle. The other option is a FLUSH_ALL_UNKILLABLE. Thanks,

Josef



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux