Re: btrfs fail behavior when a device vanishes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg
<ronniesahlberg@xxxxxxxxx> wrote:
> Here is a kludge I hacked up.
> Someone that cares could clean this up and start building a proper
> test suite or something.
>
> This test script creates a 3 disk raid1 filesystem and very slowly
> writes a large file onto the filesystem while, one by one each disk is
> disconnected then reconnected in a loop.
> It is fairly trivial to trigger dataloss when devices are bounced like this.

Yes, it's quite a torture test. I'd expect this would be a problem for
Btrfs until this feature is done at least:

https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22

And maybe this one too
https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation

Already we know that Btrfs tries to write indefinitely to missing
devices. If it reappears, what gets written? Will that device be
consistent? And then another one goes missing, comes back, now
possibly two devices with totally different states for identical
generations. It's a mess. We know that trivially causes major
corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded
modifies that; then mounts devid2 (only) rw,degraded and modifies it;
and then mounts both devids together. Kablewy. Big mess. And that's
umounting each one in between those steps; not even the abrupt
disconnect/reconnect.


-- 
Chris Murphy
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux