[ ... ] >> Oh please, please a bit less silliness would be welcome here. >> In a previous comment on this tedious thread I had written: >> > If the block device abstraction layer and lower layers work >> > correctly, Btrfs does not have problems of that sort when >> > adding new devices; conversely if the block device layer and >> > lower layers do not work correctly, no mainline Linux >> > filesystem I know can cope with that. >> >> > Note: "work correctly" does not mean "work error-free". >> >> The last line is very important and I added it advisedly. > Even looking at things that way though, Zoltan's assessment > that reliability is essentially a measure of error rate is > correct. It is instead based on a grave confusion between two very different kinds of "error rate", confusion also partially based on the ridiculous misunderstanding, which I have already pointed out, that UNIX filesystems run on top of SATA or USB devices: > Internal SATA devices absolutely can randomly drop off the bus > just like many USB storage devices do, Filesystems run on top of *block devices* with a definite interface and a definite state machine, and filesystems in general assume that the block device works *correctly*. > but it almost never happens (it's a statistical impossibility > if there are no hardware or firmware issues), so they are more > reliable in that respect. What the OP was doing was using "unreliable" both for the case where the device "lies" and the case where the device does not "lie" but reports a failure. Both of these are malfunctions in a wide sense: * The [block] device "lies" as to its status or what it has done. * The [block] device reports truthfully that an action has failed. But they are of very different nature and need completely different handling. Hint: one is an extensional property and the other is a modal one, there is a huge difference between "this data is wrong" and "I know that this data is wrong". The really important "detail" is that filesystems are, as a rule with very few exceptions, designed to work only if the block device layer (and those below it) does not "lie" (see "Bizantyne failures" below), that is "works correctly": reports the failure of every operation that fails and the success of every operation that succeeds and never gets into an unexpected state. In particular filesystems designs are nearly always based on the assumption that there are no undetected errors at the block device level or below. Then the expected *frequency* of detected errors influences how much redundancy and what kind of recovery are desirable, but the frequency of "lies" is assumed to be zero. The one case where Btrfs does not assume that the storage layer works *correctly* is checksumming: it is quite expensive and makes sense only if the block device is expected to (sometimes) "lie" about having written the data correctly or having read it correctly. The role of the checksum is to spot when a block device "lies" and turn an undetected read error into a detected one (they could be used also to detect correct writes that are misreported as having failed). The crucial difference that exists between SATA and USB is not that USB chips have higher rates of detected failures (even if they often do), but that in my experience SATA interfaces from reputable suppliers don't "lie" (more realistically have negligible "lie" rates), and USB interfaces (both host bus adapters and IO bus bridges) "lie" both systematically and statistically with non negligible rates, and anyhow the USB mass storage protocol is not very good at error reporting and handling. >> The "working incorrectly" general case is the so called >> "bizantine generals problem" [ ... ] This is compsci for beginners and someone dealing with storage issues (and not just) should be intimately familiar with the implications: https://en.wikipedia.org/wiki/Byzantine_fault_tolerance Byzantine failures are considered the most general and most difficult class of failures among the failure modes. The so-called fail-stop failure mode occupies the simplest end of the spectrum. Whereas fail-stop failure model simply means that the only way to fail is a node crash, detected by other nodes, Byzantine failures imply no restrictions, which means that the failed node can generate arbitrary data, pretending to be a correct one, which makes fault tolerance difficult. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
