Re: Hot-replace for RAID5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Just in case you missed it earlier...

Remember to take a backup before you start this!

Also make notes of things like the "mdadm --detail", version numbers, the exact commands executed, etc. (and store this information on another computer!) If something does go wrong, then that information can make it much easier for Neil or others to advise you.



On 11/05/2012 04:44, Patrik Horník wrote:
On Fri, May 11, 2012 at 2:50 AM, NeilBrown<neilb@xxxxxxx>  wrote:
On Thu, 10 May 2012 19:16:59 +0200 Patrik Horník<patrik@xxxxxx>  wrote:

Neil, can you please comment if separate operations mentioned in this
process are behaving and are stable enough as we expect? Thanks.

The conversion to and from RAID6 as described should work as expected, though
it requires having an extra device and requires to 'recovery' cycles.
Specifying the number of --raid-devices is not necessary.  When you convert
RAID5 to RAID6, mdadm assumes you are increasing number of devices by 1
unless you say otherwise.  Similarly with RAID6->RAID5 the assumption is a
decrease by 1.

Doing an in-place reshape with the new 3.3 code should work, though with a
softer "should" than above.  We will only know that it is "stable" when enough
people (such as yourself) try it and report success.  If anything does go
wrong I would of course help you to put the array back together but I can
never guarantee no data loss.  You wouldn't be the first to test the code on
live data, but you would be the second that I have heard of.

Thanks Neil, this answers my questions. I dont like being second, so
RAID5 - RAID6 - RAID5 it is... :)

In addition my array has 0.9 metadata so hot-replace would also
require conversion of metadata, so all together it seems much riskier.

The in-place reshape is not yet supported by mdadm but it is very easy to
manage directly.  Just
   echo replaceable>  /sys/block/mdXXX/md/dev-YYY/state
and as soon as a spare is available the replacement will happen.


On Thu, May 10, 2012 at 8:59 AM, David Brown<david.brown@xxxxxxxxxxxx>  wrote:
(I accidentally sent my first reply directly to the OP, and forgot the
mailing list - I'm adding it back now, because I don't want the OP to follow
my advice until others have confirmed or corrected it!)

On 09/05/2012 21:53, Patrik Horník wrote:
Great suggestion, thanks.

So I guess steps with exact parameters should be:
1, add spare S to RAID5 array
2, mdadm --grow /dev/mdX --level 6 --raid-devices N+1 --layout=preserve
3, remove faulty drive and add replacement, let it synchronize
4, possibly remove added spare S
5, mdadm --grow /dev/mdX --level 5 --raid-devices N

Yes, that's what I was thinking.  You are missing "2b - let it synchronise".

Sure :)

Of course, another possibility is that if you have the space in the system
for another drive, you may want to convert to a full raid6 for the future.
  That way you have the extra safety built-in in advance. But that will
definitely lead to a re-shape.

Actually I dont have free physical space, array already has 7 drives.
For the process I need place the additional drive on table near the PC
and cool it with fan standing by itself on table... :)

My questions:
- Are you sure steps 3, 4 and 5 would not cause reshaping?

I /believe/ it will avoid a reshape, but I can't say I'm sure.  This is
stuff that I only know about in theory, and have not tried in practice.

- My array has now left-symmetric layout, so after migration to RAID6
it should be left-symmetric-6. Is RAID6 working without problem in
degraded mode with this layout, no matter which one or two drives are

The layout will not affect the redundancy or the features of the raid - it
will only (slightly) affect the speed of some operations.

I know it should work, but it is probably configuration that is not
used much by users, so maybe it is not tested as much as standard
layouts. So the question was aiming more at practical experience and

- What happens in step 5 and how long does it take? (If it is without
reshaping, it should only upgrade superblocks and thats it.)

That is my understanding.

- What happens if I dont remove spare S before migration back to
RAID5? Will the array be reshaped and which drive will it make into
spare? (If step 5 is instantaneous, there is no reason for that. But
if it takes time, it is probably safer.)

I /think/ that the extra disk will turn into a hot spare.  But I am getting
out of my depth here - it all depends on how the disks get numbered and how
that affects the layout, and I don't know the details here.

So all and alll, what guys do you think is more reliable now, new
hot-replace or these steps?

I too am very curious to hear opinions.  Hot-replace will certainly be much
simpler and faster than these sorts of re-shaping - it's exactly the sort of
situation the feature was designed for.  But I don't know if it is
considered stable and well-tested, or "bleeding edge".





On Wed, May 9, 2012 at 8:09 AM, David Brown<david.brown@xxxxxxxxxxxx>
On 08/05/12 11:10, Patrik Horník wrote:

Hello guys,

I need to replace drive in big production RAID5 array and I am
thinking about using new hot-replace feature added in kernel 3.3.

Does someone have experience with it on big RAID5 arrays? Mine is 7 *
1.5 TB. What do you think about its status / stability / reliability?
Do you recommend it on production data?


If you don't want to play with the "bleeding edge" features, you could
the disk and extend the array to RAID6, then remove the old drive. I
if you want to do it all without doing any re-shapes, however, then you'd
need a third drive (the extra drive could easily be an external USB disk
needed - it will only be used for writing, and not for reading unless
there's another disk failure).  Start by adding the extra drive as a hot
spare, then re-shape your raid5 to raid6 in raid5+extra parity layout.
fail and remove the old drive.  Put the new drive into the box and add it
a hot spare.  It should automatically take its place in the raid5,
the old one.  Once it has been rebuilt, you can fail and remove the extra
drive, then re-shape back to raid5.

If things go horribly wrong, the external drive gives you your parity

Of course, don't follow this plan until others here have commented on it,
and either corrected or approved it.

And make sure you have a good backup no matter what you decide to do.



To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at

[ATA RAID]     [Linux SCSI Target Infrastructure]     [Managing RAID on Linux]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device-Mapper]     [Kernel]     [Linux Books]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Photos]     [Yosemite Photos]     [Yosemite News]     [AMD 64]     [Linux Networking]

Add to Google Powered by Linux