RE: Mdadm re-add fails

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Neil,

Yes, that worked:

>> [root@typhon ~]# mdadm --detail /dev/md24
/dev/md24:
   Version : 1.2
  Creation Time : Fri May 20 11:42:17 2011
  Raid Level : raid1
  Array Size : 5241844 (5.00 GiB 5.37 GB)
  Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
  Raid Devices : 2
  Total Devices : 2
  Persistence : Superblock is persistent

  Intent Bitmap : Internal

  Update Time : Fri May 20 12:47:09 2011
  State : active
  Active Devices : 2
 Working Devices : 2
 Failed Devices : 0
 Spare Devices : 0

           Name : typhon.mno.stratus.com:24  (local to host typhon.mno.stratus.com)
           UUID : 562323d9:9a7b2979:a734abf0:b3fb8f0b
           Events : 155

    Number   Major   Minor   RaidDevice State
       3      65       22        0      active sync   /dev/sdc6
       2      65       54        1      active sync   /dev/sdk6

>> [root@typhon sbin]# mdadm /dev/md24 -f /dev/sdk6 -r /dev/sdk6
mdadm: set /dev/sdk6 faulty in /dev/md24
mdadm: hot removed /dev/sdk6 from /dev/md24

Without the fix:
---------------------
>> root@typhon sbin]# mdadm /dev/md24 -a /dev/sdk6
mdadm: /dev/sdk6 reports being an active member for /dev/md24, but a --re-add fails.
mdadm: not performing --add as that would convert /dev/sdk6 in to a spare.
mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk6" first.

With the fix:
-----------------
>>  [root@typhon ~]# ./mdadm /dev/md24 -a /dev/sdk6                                 
mdadm: re-added /dev/sdk6

Thanks very much for the assistance.

Regards,
Annemarie


-----Original Message-----
From: NeilBrown [mailto:neilb@xxxxxxx] 
Sent: Thursday, May 19, 2011 7:52 PM
To: Schmidt, Annemarie
Cc: linux-raid@xxxxxxxxxxxxxxx
Subject: Re: Mdadm re-add fails

On Wed, 18 May 2011 10:43:47 -0400 "Schmidt, Annemarie"
<Annemarie.Schmidt@xxxxxxxxxxx> wrote:

> Hi!
> 
> I have a 2 disk raid1 data array. As a result of other testing, the device info
> in the superblock for one of the partners, /dev/sdc2, ended up being in slot 3
> of the device info array: 
> 
> [root@typhon ~]# mdadm --detail /dev/md21
> /dev/md21:
>   Version : 1.2
>   Creation Time : Mon May  9 11:19:43 2011
>   Raid Level : raid1
>   Array Size : 5241844 (5.00 GiB 5.37 GB)
>   Used Dev Size : 5241844 (5.00 GiB 5.37 GB)
>   Raid Devices : 2
>   Total Devices : 2
>   Persistence : Superblock is persistent
> 
>   Intent Bitmap : Internal
> 
>   Update Time : Thu May 12 15:51:50 2011
>   State : active
>   Active Devices : 2
>   Working Devices : 2
>   Failed Devices : 0
>   Spare Devices : 0
> 
>            Name : typhon.mno.stratus.com:21  (local to host typhon.mno.stratus.com)
>            UUID : 996d993f:baac367a:8b154ba9:43e56cff
>           Events : 687
> 
>     Number   Major   Minor   RaidDevice State
> -->    3      65       34        0      active sync   /dev/sdc2
>         2      65       82        1      active sync   /dev/sdk2
> 
> When I remove /dev/sdk2 and then a re-add it back in, the re-add fails:
> 
> >> [root@typhon ~]# mdadm /dev/md21 -f /dev/sdk2 -r /dev/sdk2
> mdadm: set /dev/sdk2 faulty in /dev/md21
> mdadm: hot removed /dev/sdk2 from /dev/md21
> 
> >> [root@typhon ~]# mdadm /dev/md21 -a /dev/sdk2
> mdadm: /dev/sdk2 reports being an active member for /dev/md21, but a --re-add
> fails.
> mdadm: not performing --add as that would convert /dev/sdk2 in to a spare.
> mdadm: To make this a spare, use "mdadm --zero-superblock /dev/sdk2" first.
> 
> I believe the re-add fails because the enough_fd function (util.c) is not searching deep enough into the
> dev_info array with this line of code:
>    for (i=0; i<array.raid_disks + array.nr_disks; i++)
> 
> array.raids_disk = 2 and array/nr_disks = 1, and so for this particular md device, it is only looking at slots 0-2. 
> I believe the code needs to be changed to look at all possible dev_info array slots, taking into account the 
> version of the superblock (like the Detail function does (Detail.c).  
> 
> Do folks agree?
>

I do - largely.  I think there might be a better more general way to control
the loop though.
Could you try this please?

Thanks,
NeilBrown


diff --git a/util.c b/util.c
index 1056ae4..d005e0a 100644
--- a/util.c
+++ b/util.c
@@ -370,10 +370,14 @@ int enough_fd(int fd)
 	    array.raid_disks <= 0)
 		return 0;
 	avail = calloc(array.raid_disks, 1);
-	for (i=0; i<array.raid_disks + array.nr_disks; i++) {
+	for (i=0; i < 1024 && array.raid_disks > 0; i++) {
 		disk.number = i;
 		if (ioctl(fd, GET_DISK_INFO, &disk) != 0)
 			continue;
+		if (disk.major == 0 && disk.minor == 0)
+			continue;
+		array.raid_disks--;
+
 		if (! (disk.state & (1<<MD_DISK_SYNC)))
 			continue;
 		if (disk.raid_disk < 0 || disk.raid_disk >= array.raid_disks)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux