Re: The FAQ on fsync/O_SYNC

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Apr 20, 2015 at 06:07:09AM +0000, Duncan wrote:
> 4.0 is out.  There's reason people may want to stick one version back by 
> default, to 3.19 currently, since it can take a few weeks for early 
> reports to develop into a coherent problem, and sticking one stable 
> series back allows for that, and deciding exactly when one is comfortable 
> upgrading.  But in btrfs context anyway, with 4.0 out, if you're not on 
> at least 3.19 yet, you should be able to point to the bug explaining
> /why/.  If you can't, arguably, you should be either upgrading yesterday 
> if not sooner, or you really should choose some other filesystem, as 
> btrfs simply isn't at the stability required for your use-case yet, and 
> you unnecessarily risk data loss to already found and fixed bugs as a 
> result.

I'm not sure that "run the latest kernel" or even "run the latest kernel
minus N weeks or months" is good advice for user data integrity at
present.  It's certainly unsupported by any test data I'm seeing.

If the intention is to discover and report or fix btrfs bugs, or confirm
that known bugs have been corrected, then the latest kernel (or a -next
integration branch) is the only one to run.  If the intention is to
use btrfs for data storage, then the kernel selection process is much
different.

In the stable kernels (the v3.xx.y Git tags with no other patches)
in the last year, there have been a number of btrfs regressions, from
memory leaks to deadlocks to filesystem-crashing corruption issues:

	2 severe corruption (i.e. destroy the filesystem) or memory leak
	issues (i.e. leak all the RAM and crash slowly and messily)
	I've encountered in my own testing,

	2 kernel panic or memory leak issues that I avoided by accident
	because the fix came out before I could pull the regression into
	a build,

	3 failure modes in new code leading to deadlock or temporary
	inability to retrieve stored data that first appeared in v3.13
	or v3.15, and as of today are not yet resolved.

My testing process runs like this (slightly simplified):

	1.  Build stable and/or Linus tagged kernels + integration-queue
	patches + locally-generated patches if any.

	2.  Run these kernels on various machines with workloads,
	observe and analyze failures.

	3.  When a machine fails to do its work due to a kernel issue,
	restart at step 1 with a different version or more patches.
	Note this includes more issues than btrfs; e.g. sometimes
	a kernel is not usable because of ACPI or WiFi issues that make
	btrfs test results irrelevant.

	4.  If a kernel build succeeds for N or more days, expand the set
	of test machines to get more test coverage, and go back to step 2.

	5.  If N >= 60 with no (severe) problems, consider that kernel
	stable and bless it for production.

Linux kernels getting to step 5 are rare and precious things even when
not testing btrfs.  The last kernel to reach step 5 for me was v3.12.x.
Before that was 835 days of searching for a successor to the kernel I
was running in production at the time.  :-/


> -- 
> Duncan - List replies preferred.   No HTML msgs.
> "Every nonfree program has a lord, a master --
> and if you use the program, he is your master."  Richard Stallman
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux