|
|
|
Re: MTBF of Ext3 and Partition Size | |
| [Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index] | |
Theodore Tso wrote:
On Thu, Apr 16, 2009 at 07:53:59AM -0400, Kyle Brandt wrote:On several of my servers I seem to have a high rate of server crashes do to file system errors. So I have some questions related to this: Is there any Mean Time Between Failure ( MTBF) data for the ext3 file-system? Does increased partition size cause a higher risk of the partition being corrupted? If so, is there any data on the ratio between partition size and the likely hood of failure?The probability of these sorts of filesystem problems is going to be dominated by hardware induced corruptions --- so it's not going to make a lot of sense to talk about MTBF failures without having a specific hardware context in mind. If you have lousy memory, or a lousy disk controller cable, or a cable connector which is loose then corruptions will happen often. If you are are located some place where there is a strong alpha particle source, then you will have a much greater percentage chance of bit flips. If you use ECC memory, and do very careful hardware selection, with enterprise-quality disks that trade off disk capacity for a much stronger level of ECC codes, then of course the MBTF will be much less. (For example, there was the imfamous story in the early 1990's when Sun had a spate of bad memory; I think it was ultimately traced to radioactive contamination of the ceramic materials used to make their memory chips; this caused alpha particles to cause "bit flips" and which had the result of making their customers rather antsy, especially since Sun tried todeny there was even a problem for quite some time.) So if you are having a high rate of server crashes, the first thing I would do is to make sure you have the latest distribution updates; it's possible it's caused by a kernel bug that has since been fixed, but it's somewhat unlikely. The next thing I would do is take one of the machines that has been cashing off line, and try running a 36-48 hour memory test.Does ext3 on hardware raid (10) increase the possibility of file system corruption?No, it shouldn't --- unless you have a buggy or otherwise dodgy hardware raid controller. - Ted
One note is that the file system will often be the first notification that your hardware RAID has done something wrong - you should have a careful look at any logs/errors/etc that your storage maintains for you.
Can you share specifics of your system - what is the storage, which kernel, etc? Regards, Ric Ric _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users
[Linux RAID] [Kernel List] [Red Hat Install] [Video 4 Linux] [Postgresql] [Fedora] [Fedora Legacy] [Gimp] [Yosemite News] [Linux Software]