Blake Lewis posted on Fri, 02 Dec 2016 12:36:29 -0800 as excerpted: > Well, 3.10 is what you get with the RHEL7.x distributions, so that's why > people are running it. > Apparently, it is "good enough" for many purposes. > > My real goal here is to understand the scope of the bug and whether any > mitigation is possible. Of course, I don't expect anyone else to make a > patch for me (or even to accept my patch), but if I knew what the bug is > and what was done to fix it, > I'd be in a much better position to decide what to do. If anyone can > shed any light on this, > I'd be very grateful. I'm going to try to make several points with this post, including a (perhaps too simplistic, but it works for me and apparently enough others to have a dedicated tool) suggestion for finding the problem, along with others basically agreeing with LB's reply but with a bit more background. YMMV but it's posted in the hope that it is of help. 1) I am a list regular and btrfs using admin myself, but not a dev, so don't look here for the code-level stuff. 2) This is the btrfs development list, kernel level, not a distro list, and the viewpoints generally held here may or may not correspond to that of various distros and their support teams. 3) In particular, while btrfs has been officially out of experimental for some time now, on this list btrfs is held (by both btrfs devs and list- regular users) to be still stabilizing, not fully stable and mature, as I normally describe it. Both on this list and on the btrfs wiki (at https://btrfs.wiki.kernel.org ), the strong recommendation is to keep current on the kernel in particular, because bugs are still being found and fixed, and naturally, this list being development-focused, the view tends to be rather more forward leaning than particularly "enterprise- stale^H^Hble" distros. Also strongly recommended is keeping backups, tested and ready-to-use, because there /are/ still serious bugs being fixed, and sometimes the problems they trigger are, for non-devs at least, simply easiest to fix by blowing away the existing filesystem and starting over with a clean one and backups to recover from. 4) OTOH, some distros and other product vendors (including your company) obviously consider btrfs if not yet /entirely/ stable, stable /enough/ to build and ship product on. While this is accepted as a questionable but already-in-the-wild-so-get-used-to-it position on the list, for kernels outside our normal support range (next point), the standard answer to users asking here for support is to say that while we recognize some distros support it on older kernels and with older btrfs userspace, we on this list tend to be forward looking and don't track what patches distros and vendors may have backported and which ones they haven't. Thus, there's a 3-way choice they need to make, either (a) upgrade to something withing our recommended support range so we can best help, or (b) take the distro/vendor up on the support they offer and that the user may in fact be paying for, since they're best positioned to provide that support for older kernels they've done their own patching to, or (c) stay where they are and muddle thru the best they can with the limited support we /can/ offer -- we'll still do what we can, but honestly, the "impedance mismatch" with code that old is going to make it difficult to apply what limited support we can offer. 5) The versions we support best on the list, keeping in mind the above "current kernel" recommendation, are the two latest kernel series in either the current or LTS series. On the current side, kernel 4.9 is very close to out now, so we like to see 4.8, tho 4.7 is still current enough that people on-list are likely to be able to reply sanely in terms of whether we recognize a bug and whether it's still current or has already been fixed. On the LTS side, contradicting LB slightly, btrfs does try to backport bugfixes when we know they're needed in LTS series -- no effort is made to backport to non-LTS beyond the relatively short mainline non-LTS current kernel series support period, and as I said, they basically go out of support two kernels back. The two most recent LTS kernel series are 4.4 and 4.1, with 3.18 before that. Based on the two-most-recent policy, 4.4 is definitely still in focus, and we do still try for 4.1 as well, tho it's getting long enough in the tooth now that if the bug isn't recognized, an early question/ recommendation is going to be to try with a newer kernel. The 3.18 LTS series actually ended up reasonably stable for btrfs, while 4.4 may have taken a bit longer than normal to mature (in non-btrfs areas as well), so while it's back too far for much active support, it was working and is likely still working quite well for many. 3.16 was the LTS series before that, but that was a rough period for btrfs, and honestly, btrfs LTS support was new enough back then that we were only really trying to support the latest one, so 3.18 is really the practical horizon in terms of list support. Before that there were some pretty bad bugs that nobody wants to even think about any more. 6) Meanwhile, what a lot of people don't realize is that until 3.12 (IIRC) stripped off the experimental label, btrfs remained officially experimental, with a pretty strong warning on both the kernel option and mkfs.btrfs on the user side. In list support terms that's seriously ancient history now (think electricity in the 18th century maturity, it was a toy people could see was going to do great things one day and people were doing stuff with it, but it really /was/ experimental in anything even close to modern-day terms) and you're pretty much on your own. Given that you're asking about 3.10, well... like I said, that's still btrfs experimental era, and I don't think I'm alone when I seriously question the sanity of anyone still attempting to support a product running btrfs on /that/! If you're doing it, and people are willing to pay for it, well, I can't argue with that, but in all honesty, it's your customer's data you're putting at risk, and were I to see a product still running btrfs on a 3.10 kernel (or really, earlier than 3.18, for the reasons explained above, but 3.10 really /was/ still labeled experimental! ), I'd immediately mark everything that company sold as highly experimental and thus questionable, as well. <shrug> Just being honest. So you can see why you're getting told to upgrade to /something/ semi- recent, 3.18 LTS, at absolute minimum. Some of those bugs in the early 3.15-3.16 era tie my stomach in knots thinking about people still running those versions, they were that bad, even if in theory those bugs are long patched, by now. Like I said, we were glad to see 3.18 LTS and it really /did/ end up surprisingly stable, up thru early 4.4 at least, when we stopped tracking it. And FWIW, I /could/ be wrong here, but I /believe/ I saw that even Red Hat now actively recommends that people move off of btrfs on 3.10 era RHEL-7.x. (At least there was a post on this list that I interpreted to that effect, tho if it actually affected me I'd be double-checking.) I don't know what their actual support status is for those that don't. You might want to look that up, because if it's true, it /would/ give your company a bit of an out in terms of supporting then-experimental btrfs, as well. All that said, if you insist, again keeping in mind that I'm not a coder, so it's unsurprising that I don't have much in the way of specific commits to point you at, but at least here's a way to help you find them, yourself. =:^) 7) Incremental problem/fix bisect. Recursively break the problem space roughly in half and test to see which half the problem/fix is in, then recurse by breaking that half in half and testing again. You can use either git bisect, which has been popularized as a way for even non- coders like me to nail down bugs (or in your case fixes) to specific commits, or perhaps first, to narrow down the range you need to git bisect, simply by doing a manual bisect of the release series between 3.10 and 4.8, to find where the problem goes away, and then looking at that commit or the commits around that area to see what might have changed that broke, or in your case, fixed, the problem. Bisect may be dumb and brute force, but it works surprisingly well, especially since git bisect automated most of the process, and as I said, it has allowed many non-coders to help in tracing and ultimately fixing their bugs. I know it has worked that way for me. And a modern git bisect better stays with release and then rc tags, then big merge points, as long as possible before diving down into individual commits, as well, making the incremental-bisect process a bit nicer and less "black-box" than it used to be, too. Given what I said about 3.18 being a really good LTS in btrfs terms, I might suggest you start by testing it. If it fixes the problem for you, then you can decide whether to try to push it as an upgrade, or try to bisect the problem further in ordered to properly backport the fix. If it doesn't, of course the 4.1 and 4.4 LTS kernels are other major test points you can try. So indeed, some of that was rehash, but with hopefully helpful additional detail now, and the bisect suggestion may be too simplistic. But hope /some/ of it was helpful, anyway. =:^) -- Duncan - List replies preferred. No HTML msgs. "Every nonfree program has a lord, a master -- and if you use the program, he is your master." Richard Stallman -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html
