Re: Raid5 crashed, need comments on possible repair solution

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

On Mon, 23 Apr 2012 23:47:12 +0200 Christoph Nelles
<evilazrael@xxxxxxxxxxxxx> wrote:

> Hello Neil,
> first thanks for the answer. I will happily provide any data or logs if
> it helps you to investigate this problem.
> Am 23.04.2012 23:00, schrieb NeilBrown:
> > This is really worrying.  It's about the 3rd or 4th report recently which
> > contains:
> > 
> >>      Raid Level : -unknown-
> >>    Raid Devices : 0
> > 
> > and that should not be possible.  There must be some recent bug that causes
> > the array to be "cleared" *before* writing out the metadata - and that should
> > be impossible.
> > What kernel are you running?
> I switched kernel versions during that server rebuild. Last running
> system was with 3.2.5, then rebuild and switch to 3.3.1 ant with that it
> crashed. Kernel is vanilla selfcompiled, x86_64.
> mdadm is 3.1.5, selfcompiled, too.

This is suggestive that it is a very recently introduced bug, and your
earlier observation that the "update time" correlated with the machine being
rebooted was very helpful.
I believe I have found the problem and have reproduced the symptom
The sequence I used to reproduce it was a bit forced and probably isn't
exactly what happened in your case.  Maybe there is a race condition that can
trigger it as well.

In any case, the following patch should fix the issue, and is strongly
recommended for any kernel to which it applies.

I'll send this upstream shortly.

Of course this doesn't help you with your current problem though at least it
suggests that it won't happen again.

I recall that you said you would be re-creating the array with a chunk size
of 64k.  The default has been 512K since mdadm-3.1 in late 2009.
Did you explicitly create with "-c 64" when you created the array? If not,
maybe you need to use "-c 512".


diff --git a/drivers/md/md.c b/drivers/md/md.c
index 333190f..4a7002d 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8402,7 +8402,8 @@ static int md_notify_reboot(struct notifier_block *this,
 	for_each_mddev(mddev, tmp) {
 		if (mddev_trylock(mddev)) {
-			__md_stop_writes(mddev);
+			if (mddev->pers)
+				__md_stop_writes(mddev);
 			mddev->safemode = 2;

Attachment: signature.asc
Description: PGP signature

[ATA RAID]     [Linux SCSI Target Infrastructure]     [Managing RAID on Linux]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device-Mapper]     [Kernel]     [Linux Books]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Photos]     [Yosemite Photos]     [Yosemite News]     [AMD 64]     [Linux Networking]

Add to Google Powered by Linux