Re: Hot-replace for RAID5
|[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
On 05/18/12 05:45, NeilBrown wrote:
On Thu, 17 May 2012 01:34:15 +0200 Oliver Martin<oliver@xxxxxxxxxxxxxxxx> wrote:Hi Neil, Am 11.05.2012 02:50, schrieb NeilBrown:Doing an in-place reshape with the new 3.3 code should work, though with a softer "should" than above. We will only know that it is "stable" when enough people (such as yourself) try it and report success. If anything does go wrong I would of course help you to put the array back together but I can never guarantee no data loss. You wouldn't be the first to test the code on live data, but you would be the second that I have heard of.I guess I'll be taking 2nd place then. I just used it on three live raid6 arrays, and it worked perfectly.3 arrays - so you are 2nd, 3rd, and 4th :-)
Good to know that when all is good, hot-replace works.I wonder if all "error paths" were considered and implemented (and maybe even tested, but we users could help with testing if we understand the intended behaviour), i.e.
what happens when the disk being hot-replaced shows read errors in locations previously unknown to the bad-block list: does it
- immediately fall back to fail+rebuild or- first tries a recompute + rewrite of the sector, then if rewrite fails it falls back to fail+rebuild - first tries a recompute + rewrite of the sector, then if rewrite fails it adds the block to bad block list, then if the list is out-of-space it falls back to fail+rebuild
?What happens if the destination of the hot-replace has *one* write error? And *lots* of write errors?
What happens if one hot-replace hits a sector for which both the disk being replaced and another one have an entry in the bad block list, and so there is not enough parity information to recompute? Does it proceed anyway marking the corresponding sector in the bad-block-list for the destination device (=nonvalid strip), or it fails the hot-replace, or what?
(this is actually more about bad block lists)What happens if a *different* disk shows back sectors due to concomitant reads (simultaneous but not caused by hot-replace): - first recomputes and rewrites, then if rewrite fails it is added to bad block list, then if list is full it gets failed? Or can another hot-replace get started when already one is running?
Thank you -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html