>>> If the underlying protocal doesn't support retry and there >>> are some transient errors happening somewhere in our IO >>> stack, we'd like to give an extra chance for IO. >> A limited number of retries may make sense, though I saw some >> long stalls after retries on bad disks. Indeed! One of the major issues in actual storage administration is to find ways to reliably disable most retries, or to shorten them, both at the block device level and the device level, because in almost all cases where storage reliability matters what is important is simply swapping out the failing device immediately and then examining and possible refreshing it offline. To the point that many device manufacturers deliberately cripple in cheaper products retry shortening or disabling options to force long stalls, so that people who care about reliability more than price will buy the more expensive version that can disable or shorten retries. > Seems preferable to avoid issuing retries when the underlying > transport layer(s) has already done so, but I am not sure > there is a way to know that at the fs level. Inded, and to use an euphemism, a third layer of retries at the filesystem level are currently a thoroughly imbecilic idea :-), as whether retries are worth doing is not a filesystem dependent issue (but then plugging is done at the block io level when it is entirely device dependent whether it is worth doing, so there is famous precedent). There are excellent reasons why error recovery is in general not done at the filesystem level since around 20 years ago, which do not need repeating every time. However one of them is that where it makes sense device firmware does retries, and the block device layer does retries too, which is often a bad idea, and where it is not, the block io level should be do that, not the filesystem. A large part of the above discussion would not be needed if Linux kernel "developers" exposed a clear notion of hardware device and block device state machine and related semantics, or even knew that it were desirable, but that's an idea that is only 50 years old, so may not have yet reached popularity :-). -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
