Re: [patch] limit error rate | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] | |
Hello Dan,
On Wednesday 23 April 2008, Dan Williams wrote:
> On Sat, Apr 12, 2008 at 11:16 AM, Bernd Schubert <bernd-schubert@xxxxxx>
wrote:
> > Hello,
> >
> > last night we had scsi problems and a hardware raid
> > unit was offlined during heavy i/o. While this happened we got for
> > about 3 minutes a huge number messages like these
> >
> > Apr 12 03:36:07 pfs1n14 kernel: [197510.696595] raid5:md7: read error
> > not correctable (sector 2993096568 on sdj2).
> >
> > I guess the high error rate is responsible for not scheduling other
> > events - during this time the system was not pingable and in the end
> > also other devices run into scsi command timeouts causing problems on
> > these unrelated devices as well.
> >
> >
> > Signed-off-by: Bernd Schubert <bernd-schubert@xxxxxx>
>
> Hi Bernd,
>
> This patch is whitespace damaged (tabs-->spaces). Can you resend as
> an attachment?
hmm, don't know how I managed to do that. Probably copied it from the shell...
I have attached it this time. I also just added another printk_ratelimit().
Btw, from my point of view the
if (printk_ratelimit())
printk("print output");
looks odd. I just don't see why the API isn't
printk_ratelimit("print output");
Oh well, modifying this all over the code would give a huge almost useless
patch _only_ improving the beauty of code.
Thanks,
Bernd
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index b162b83..60d3442 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1141,10 +1141,12 @@ static void raid5_end_read_request(struct bio * bi, int error)
set_bit(R5_UPTODATE, &sh->dev[i].flags);
if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
rdev = conf->disks[i].rdev;
- printk(KERN_INFO "raid5:%s: read error corrected (%lu sectors at %llu on %s)\n",
- mdname(conf->mddev), STRIPE_SECTORS,
- (unsigned long long)(sh->sector + rdev->data_offset),
- bdevname(rdev->bdev, b));
+ if (printk_ratelimit())
+ printk(KERN_INFO "raid5:%s: read error corrected"
+ " (%lu sectors at %llu on %s)\n",
+ mdname(conf->mddev), STRIPE_SECTORS,
+ (unsigned long long)(sh->sector + rdev->data_offset),
+ bdevname(rdev->bdev, b));
clear_bit(R5_ReadError, &sh->dev[i].flags);
clear_bit(R5_ReWrite, &sh->dev[i].flags);
}
@@ -1157,19 +1159,20 @@ static void raid5_end_read_request(struct bio * bi, int error)
clear_bit(R5_UPTODATE, &sh->dev[i].flags);
atomic_inc(&rdev->read_errors);
- if (conf->mddev->degraded)
+ if (conf->mddev->degraded && printk_ratelimit())
printk(KERN_WARNING "raid5:%s: read error not correctable (sector %llu on %s).\n",
mdname(conf->mddev),
(unsigned long long)(sh->sector + rdev->data_offset),
bdn);
- else if (test_bit(R5_ReWrite, &sh->dev[i].flags))
+ else if (test_bit(R5_ReWrite, &sh->dev[i].flags) &&
+ printk_ratelimit())
/* Oh, no!!! */
printk(KERN_WARNING "raid5:%s: read error NOT corrected!! (sector %llu on %s).\n",
mdname(conf->mddev),
(unsigned long long)(sh->sector + rdev->data_offset),
bdn);
else if (atomic_read(&rdev->read_errors)
- > conf->max_nr_stripes)
+ > conf->max_nr_stripes && printk_ratelimit())
printk(KERN_WARNING
"raid5:%s: Too many read errors, failing device %s.\n",
mdname(conf->mddev), bdn);
[Home] [ATA RAID] [Linux] [Managing RAID on Linux] [Linux IDE] [Linux SCSI] [Linux Hams] [Device-Mapper] [Kernel] [Linux Books] [Linux Admin] [Linux Net] [GFS] [RPM] [Photos] [Yosemite Photos] [Yosemite News] [AMD 64] [Linux Nework]
![]() |