On Thu, Dec 31, 2015 at 6:09 PM, ronnie sahlberg <ronniesahlberg@xxxxxxxxx> wrote: > Here is a kludge I hacked up. > Someone that cares could clean this up and start building a proper > test suite or something. > > This test script creates a 3 disk raid1 filesystem and very slowly > writes a large file onto the filesystem while, one by one each disk is > disconnected then reconnected in a loop. > It is fairly trivial to trigger dataloss when devices are bounced like this. Yes, it's quite a torture test. I'd expect this would be a problem for Btrfs until this feature is done at least: https://btrfs.wiki.kernel.org/index.php/Project_ideas#Take_device_with_heavy_IO_errors_offline_or_mark_as_.22unreliable.22 And maybe this one too https://btrfs.wiki.kernel.org/index.php/Project_ideas#False_alarm_on_bad_disk_-_rebuild_mitigation Already we know that Btrfs tries to write indefinitely to missing devices. If it reappears, what gets written? Will that device be consistent? And then another one goes missing, comes back, now possibly two devices with totally different states for identical generations. It's a mess. We know that trivially causes major corruption with btrfs raid1 if a user mounts e.g. devid1 rw,degraded modifies that; then mounts devid2 (only) rw,degraded and modifies it; and then mounts both devids together. Kablewy. Big mess. And that's umounting each one in between those steps; not even the abrupt disconnect/reconnect. -- Chris Murphy -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
