Hi Neil,
this is yet another issue I encounter, which is indirectly related to
bad-blocks code, but I think it can be hit when bad-blocks logging is
disabled too.
Scenario:
- RAID1 with one device A, one device missing
- mdadm --manage /dev/mdX --add /dev/B (fresh device B added)
- recovery of B starts
Now at some point, end_sync_write() on B returns with error. Now the
following can happen:
In sync_request_write() we do:
1/
/*
* schedule writes
*/
atomic_set(&r1_bio->remaining, 1);
2/ then we schedule WRITEs, so for each WRITE scheduled we do:
atomic_inc(&r1_bio->remaining);
3/ then we do:
if (atomic_dec_and_test(&r1_bio->remaining)) {
/* if we're here, all write(s) have completed, so clean up */
md_done_sync(mddev, r1_bio->sectors, 1);
put_buf(r1_bio);
}
So assume that end_sync_write() completed with error, before we got to
3/. Then in end_sync_write() we set R1BIO_WriteError, and the we
decrement r1_bio->remaining, so it becomes 1, so we bail out and don't
call reschedule_retry().
Then in 3/ we decrement r1_bio->remaining again, see that it is 0 now
and complete the bio....without marking bad block or failing the
device. So we think that this region is in-sync, while it's not,
because we hit IO error on B.
I checked vs 2.6 versions and such behavior makes sense there, because
R1BIO_WriteError or R1BIO_MadeGood cases are not present there (no
bad-blocks functionality). But now, we must call reschedule_retry() at
both places (if needed). Does this make sense?
I tested the following patch, which seems to work ok:
$ diff -U 20 c:/work/code_backups/psp13/bad_sectors/raid1.c
ubuntu-precise/source/drivers/md/raid1.c
--- c:/work/code_backups/psp13/bad_sectors/raid1.c Mon Jul 16 17:10:24 2012
+++ ubuntu-precise/source/drivers/md/raid1.c Mon Jul 16 17:45:13 2012
@@ -1793,42 +1793,47 @@
*/
atomic_set(&r1_bio->remaining, 1);
for (i = 0; i < disks ; i++) {
wbio = r1_bio->bios[i];
if (wbio->bi_end_io == NULL ||
(wbio->bi_end_io == end_sync_read &&
(i == r1_bio->read_disk ||
!test_bit(MD_RECOVERY_SYNC, &mddev->recovery))))
continue;
wbio->bi_rw = WRITE;
wbio->bi_end_io = end_sync_write;
atomic_inc(&r1_bio->remaining);
md_sync_acct(conf->mirrors[i].rdev->bdev, wbio->bi_size >> 9);
generic_make_request(wbio);
}
if (atomic_dec_and_test(&r1_bio->remaining)) {
/* if we're here, all write(s) have completed, so clean up */
- md_done_sync(mddev, r1_bio->sectors, 1);
- put_buf(r1_bio);
+ if (test_bit(R1BIO_MadeGood, &r1_bio->state) ||
+ test_bit(R1BIO_WriteError, &r1_bio->state))
+ reschedule_retry(r1_bio);
+ else {
+ md_done_sync(mddev, r1_bio->sectors, 1);
+ put_buf(r1_bio);
+ }
}
}
/*
* This is a kernel thread which:
*
* 1. Retries failed read operations on working mirrors.
* 2. Updates the raid superblock when problems encounter.
* 3. Performs writes following reads for array synchronising.
*/
static void fix_read_error(struct r1conf *conf, int read_disk,
sector_t sect, int sectors)
{
struct mddev *mddev = conf->mddev;
while(sectors) {
int s = sectors;
int d = read_disk;
int success = 0;
int start;
Thanks,
Alex.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
[ATA RAID]
[Linux SCSI Target Infrastructure]
[Managing RAID on Linux]
[Linux IDE]
[Linux SCSI]
[Linux Hams]
[Device-Mapper]
[Kernel]
[Linux Books]
[Linux Admin]
[Linux Net]
[GFS]
[RPM]
[git]
[Photos]
[Yosemite Photos]
[Yosemite News]
[AMD 64]
[Linux Networking]