On 03/30/2016 08:49 AM, Yauhen Kharuzhy wrote:
On Tue, Mar 29, 2016 at 10:22:29PM +0800, Anand Jain wrote:
Write and Flush errors are considered as critical errors,
upon which the device will be brought offline and marked as
failed. Write and Flush errors are identified using device
error statistics.
Signed-off-by: Anand Jain <anand.jain@xxxxxxxxxx>
btrfs: check for failed device and hot replace
This patch creates casualty_kthread to check for the failed
devices, and triggers device replace.
Signed-off-by: Anand Jain <anand.jain@xxxxxxxxxx>
---
fs/btrfs/ctree.h | 2 +
fs/btrfs/disk-io.c | 161 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
fs/btrfs/disk-io.h | 2 +
fs/btrfs/volumes.c | 1 +
fs/btrfs/volumes.h | 4 ++
5 files changed, 169 insertions(+), 1 deletion(-)
btrfs_check_and_handle_casualty() tries to perfom auto-replacement
only once after each failure. If no hotspare was added in system before failure, only one
remaining way to replace drive is to perform replace manually. This sounds
reasonable, so just clarification: are you sure that we shouldn't start
autoreplacement if hotspare will be added after drive failure?
V1 of the patchset tried to perform autoreplace endlessly until replace
drive is added.
Yeah. I did that change purposely, but in V3 I have reverted, so
that code is more flexible and has better design control/change.
Thanks, Anand
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html