Re: mdadm dropped disk, won't re-add

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On Wed Feb 15, 2012 at 02:58:42PM +0100, John Paul Adrian Glaubitz wrote:

> Hello,
> 
> I have a rather big problem with my Linux software RAID5.
> 
> It consists of 4 SATA disks each 1 TB in size, resulting in a 3 TB RAID5
> volume (/dev/md0 assembled from /dev/sd{b,c,d,e}1.
> 
> Today, mdadm kicked disk sde1 from the RAID since the cable seemed to
> make problems. I shutdown the machine, replaced the cable and tried
> re-adding the disk, however, mdadm refused to add the drive.
> 
> So I re-partioned sde1 and added it as a new devices, mdadm instantly
> started rebuilding the raid. Unfortunately, during the rebuild, mdadm
> decided to kick sdc1 and I have now ended up with two drives failing.
> 
> I have tried re-adding sdc1 with the --re-add command, but mdadm again
> refuses to re-add the drive.
> 
That's a safety measure. If it can't actually re-add the drive then it
fails, rather than changing to do an --add instead (as older mdadm
versions did), potentially losing data.

> I haven't changed anything since as I don't know what to do further. I
> don't want to make any further damage to the raid and hope that someone
> knows how to restore it.
> 
> My primary question is whether mdadm actually deletes any important data
> on the remaining disks (sd{b,c,d}1) while rebuilding or whether it just
> writes data to the newly added disk sde1.
> 
It just writes data/checksums to the newly added disk. The only writes
to the remaining disks will be if other applications are writing to the
array during the rebuild process.

> mdadm is version 3.2.3, kernel is Linux 3.2.0 on Debian Wheezy.
> 
> Can anyone give further advise?
> 
What errors does dmesg give about why sdc1 was failed? You'll need to
fix that before you try recovering the array. If it's a drive error then
using ddrescue to clone it (or as much of it as possible) to sde1 would
probably be your best bet, then get a replacement drive.

Once you've fixed that issue then you should be able to force assemble
the array (mdadm -S /dev/md0; mdadm -Af /dev/md0) and continue/restart
the recovery process. I'd recommend doing a fsck on the filesystem
afterwards as well, especially if you've replaced sdc.

If the force assembly fails then try it with added verbosity (mdadm -S
/dev/md0; mdadm -Afvvv /dev/md0) and post the output from that (and from
dmesg) and hopefully someone will be able to figure out what's going
wrong.

Cheers,
    Robin
-- 
     ___        
    ( ' }     |       Robin Hill        <robin@xxxxxxxxxxxxxxx> |
   / / )      | Little Jim says ....                            |
  // !!       |      "He fallen in de water !!"                 |

Attachment: pgp10MCc6tDRb.pgp
Description: PGP signature


[ATA RAID]     [Linux SCSI Target Infrastructure]     [Managing RAID on Linux]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device-Mapper]     [Kernel]     [Linux Books]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Photos]     [Yosemite Photos]     [Yosemite News]     [AMD 64]     [Linux Networking]

Add to Google Powered by Linux