- To: NeilBrown <neilb@xxxxxxx>
- Subject: Re: raid1 repair: sync_request() aborts if one of the drives has bad block recorded
- From: Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
- Date: Mon, 16 Jul 2012 11:45:12 +0300
- Cc: linux-raid <linux-raid@xxxxxxxxxxxxxxx>
- In-reply-to: <20120716133753.1a34e149@notabene.brown>
Hi Neil,
thanks for your comments.
> I don't see that MD_RECOVERY_REQUESTED is really an issue is it?
> We the wrong thing happen in any case where there just one device with no bad
> block, and at least one device with a bad block. In that case we want to
> skip forward by the number of bad blocks, not skip to the end of the array.
I think that MD_RECOVERY_REQUESTED has some meaning here. Because
there are only two places where we increase "write_targets":
Here:
} else if (!test_bit(In_sync, &rdev->flags)) {
bio->bi_rw = WRITE;
bio->bi_end_io = end_sync_write;
write_targets ++;
and
if (test_bit(MD_RECOVERY_SYNC, &mddev->recovery) && read_targets > 0)
/* extra read targets are also write targets */
write_targets += read_targets-1;
So if we'll see a device that is not In_sync, we will increase
write_targets and won't fall into that issue. That's why I was
thinking to check for MD_RECOVERY_REQUESTED.
>
> So this?
>
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 240ff31..c443f93 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -2422,7 +2422,10 @@ static sector_t sync_request(struct mddev *mddev, sector_t sector_nr, int *skipp
> /* There is nowhere to write, so all non-sync
> * drives must be failed - so we are finished
> */
> - sector_t rv = max_sector - sector_nr;
> + sector_t rv;
> + if (min_bad > 0)
> + max_sector = sector_nr + min_bad;
> + rv = max_sector - sector_nr;
> *skipped = 1;
> put_buf(r1_bio);
> return rv;
>
I tested it and it works. However, I have a question: shouldn't we try
to clear bad blocks during "repair" in particular or during any
kind-of-sync in general?
Because currently the following is happening:
- In sync_request devices are selected as candidates for READs and for
WRITEs (using various criteria)
- Then we issue the READs and eventually end up in sync_request_write()
- Here we schedule WRITEs (without consulting bad blocks or WriteErrorSeen bit)
- In end_sync_write, we check if we can clear some bad blocks, and if
yes, eventually we end up in handle_sync_write_finished(), where we
clear bad blocks
So getting back to sync_request(), the issue is this: if we consider a
device for READs, we check for badblocks. If we find a badblock, we
skip it and don't consider this device neither for READs nor for
WRITEs. How about:
- We consider a device for READs
- If it has a bad block, we consider it for WRITEs instead (so we have
a 3rd place where we consider the device for WRITE).
- We may consult WriteErrorSeen bit as we do in normal make_request(),
but not sure, because right now on sync path, we don't consult it
So this patch:
--- c:/work/code_backups/psp13/raid1.c Sun Jul 15 16:39:07 2012
+++ ubuntu-precise/source/drivers/md/raid1.c Mon Jul 16 11:35:40 2012
@@ -2385,20 +2385,25 @@
if (wonly < 0)
wonly = i;
} else {
if (disk < 0)
disk = i;
}
bio->bi_rw = READ;
bio->bi_end_io = end_sync_read;
read_targets++;
}
+ else {
+ bio->bi_rw = WRITE;
+ bio->bi_end_io = end_sync_write;
+ write_targets ++;
+ }
}
if (bio->bi_end_io) {
atomic_inc(&rdev->nr_pending);
bio->bi_sector = sector_nr + rdev->data_offset;
bio->bi_bdev = rdev->bdev;
bio->bi_private = r1_bio;
}
}
rcu_read_unlock();
if (disk < 0)
I tested this patch (together with yours) and it cleared the bad
blocks, but not sure what else I might be missing. My Linux kernel
programming skills are still ... questionable.
Thanks,
Alex.
>
> Thanks,
> NeilBrown
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[ATA RAID]
[Linux SCSI Target Infrastructure]
[Managing RAID on Linux]
[Linux IDE]
[Linux SCSI]
[Linux Hams]
[Device-Mapper]
[Kernel]
[Linux Books]
[Linux Admin]
[Linux Net]
[GFS]
[RPM]
[git]
[Photos]
[Yosemite Photos]
[Yosemite News]
[AMD 64]
[Linux Networking]