When one of the device is missing, bbio_error() takes care
of setting the error status. And if its only IO that is
pending in that stripe, it fails to check the status of the
other IO at %bbio_error before setting the error %bi_status
for the %orig_bio. Fix this by checking if %bbio->error is
has crossed the %bbio->max_errors. Thxs.
Reproducer as below fdatasync error is seen intermittently.
mount -o degraded /dev/sdc /btrfs
dd status=none if=/dev/zero of=$(mktemp /btrfs/XXX) bs=4096 count=1 conv=fdatasync
dd: fdatasync failed for ‘/btrfs/LSe’: Input/output error
The reason for the intermittences of the problem is because..
following condition has to be met, which depends on timely
coordination.
In btrfs_map_bio()
. The RAID1 the missing device has to be at %dev_nr = 1
In bbio_error()
. Before bbio_error() is called the bio of the not-missing
device at %dev_nr=0 must be completed so that the below
condition is true
if (atomic_dec_and_test(&bbio->stripes_pending)) {
Signed-off-by: Anand Jain <anand.jain@xxxxxxxxxx>
---
fs/btrfs/volumes.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c
index 9af633dcf015..efd502176915 100644
--- a/fs/btrfs/volumes.c
+++ b/fs/btrfs/volumes.c
@@ -6131,7 +6131,10 @@ static void bbio_error(struct btrfs_bio *bbio, struct bio *bio, u64 logical)
btrfs_io_bio(bio)->mirror_num = bbio->mirror_num;
bio->bi_iter.bi_sector = logical >> 9;
- bio->bi_status = BLK_STS_IOERR;
+ if (atomic_read(&bbio->error) > bbio->max_errors)
+ bio->bi_status = BLK_STS_IOERR;
+ else
+ bio->bi_status = 0;
btrfs_end_bbio(bbio, bio);
}
}
--
2.13.1
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html