On Tue, Sep 24, 2019 at 07:07:41AM -0400, James Harvey wrote: > On Tue, Sep 24, 2019 at 5:58 AM Filipe Manana <fdmanana@xxxxxxxxx> wrote: > > > > On Sun, Sep 15, 2019 at 2:55 PM Filipe Manana <fdmanana@xxxxxxxxx> wrote: > > > > > > On Sun, Sep 15, 2019 at 1:46 PM James Harvey <jamespharvey20@xxxxxxxxx> wrote: > > > > ... > > > > You'll see they're different looking backtraces than without the > > > > patch, so I don't actually know if it's related to the original > > > > regression that several others reported or not. > > > > > > It's a different problem. > > > > So the good news is that on upcoming 5.4 the problem can't happen, due > > to a large patch series from Josef regarding space reservation > > handling which, as a side effect, solves that problem and doesn't > > introduce new ones with concurrent fsyncs. > > > > However that's a large patch set which depends on a lot of previous > > cleanups, some of which landed in the 5.3 merge window, > > Backporting all those patches is against the backport policies for > > stable release [1], since many of the dependencies are cleanup patches > > and many are large (well over the 100 lines limit). > > > > On the other it's not possible to send a fix for stable releases that > > doesn't land on Linus' tree first, as there's nothing to fix on the > > current merge window (5.4) since that deadlock can't happen there. > > > > So it seems like a dead end to me. > > > > Fortunately, as you told me privately, you only hit this once and it's > > not a frequent issue for you (unlike the 5.2 regression which > > caused you the hang very often). You can workaround it by mounting the > > fs with "-o notreelog", which makes fsyncs more expensive, > > so you'll likely see some performance degradation for your > > applications (higher latency, less throughput). > > > > [1] https://www.kernel.org/doc/html/v4.15/process/stable-kernel-rules.html > > > All understood, thanks for letting me know. Not a problem. I have > still only ran into this crash once, about 9 days ago. I haven't had > another btrfs problem since then, unlike the hourly hangs on 5.2 with > heavy I/O. We are seeing this crash internally on our testing tier, we're still running it down but it's pretty elusive. I'll CC you when we find it and fix it. Thanks, Josef
