Re: [PATCH 00/13 v3] Introduce device state 'failed', Hot spare and Auto replace

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am Mon, 4 Apr 2016 04:45:16 +0000 (UTC)
schrieb Duncan <1i5t5.duncan@xxxxxxx>:

> Kai Krakow posted on Mon, 04 Apr 2016 02:00:43 +0200 as excerpted:
> 
> > Does this also implement "copy-back" - thus, it returns the
> > hot-spare device to global hot-spares when the failed device has
> > been replaced?  
> 
> I don't believe it does that in this initial implementation, anyway.
> 
> There's a number of issues with the initial implementation, including
> the fact that the hot-spare is global only and can't be specifically
> assigned to a filesystem or set of filesystems, which means, if you
> have multiple filesystems using different sized devices, the
> hot-spares must be sized to match the largest device they could
> replace, and thus would be mostly wasted if they ended up replacing a
> far smaller device.  If the spares could be associated with specific
> filesystems, then specifically sized spares could be associated
> appropriately, avoiding that waste. Additionally, it would then be
> possible to queue up say 20 spares on an important filesystem, with
> no spares on another that you'd rather just go down if a device fails.
> 
> So obviously the initial implementation isn't seriously
> enterprise-ready and is sub-optimal in many ways, but it's better
> than what is currently available (no automated spare handling at
> all), and an implementation must start somewhere, so as long as it's
> designed to be improved and extended with the missing features over
> time, as has been indicated, it's a reasonable first-implementation.

Your argument would be less important if it did copy-back, tho... ;-)

It's a very welcome and good start, I didn't mean to talk it useless.
By no way.

But to handle it right, that point should be clear. Currently, if the
global spare jumps in, you can always simulate "hot spare" by manually
putting back a correctly sized drive, then remove the spare again to
simulate copy-back, then make it global spare again.

Since such an incident needs manual investigation anyways, it's totally
reasonable to start with this implementation.

This sort of handling could be made into a guide within the docs.

-- 
Regards,
Kai

Replies to list-only preferred.

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux