On 12.08.19 г. 17:36 ч., Vladimir Panteleev wrote: > Hi Nikolay, > > Thank you for looking at my patch! > > You are completely correct in that this pampers over a bug I do not > understand. And, I would very much like to understand and fix the > underlying bug instead of settling for a workaround. > > Unfortunately, after three days of looking at BTRFS code (and getting > to where I am now), I have realized that, as a developer with no > experience in filesystems or kernel development, it would take me a > lot more, possibly several weeks, to reach a level of understanding of > BTRFS to the point where I could contribute a meaningful fix. This is > not something I would be opposed to, as I have the time and I've > personally invested into BTRFS, but it certainly would be a lot easier > if I could at least get occasional confirmation that my findings and > understanding so far are correct and that I am on the right track. > Unfortunately the people in a position to do this seem to be too busy > with far more important issues than helping debug my particular edge > case, and the previous thread has not received any replies since my > last few posts there, so this patch is the least I could contribute so > far. > > FWIW #1: My current best guess at why the problem occurs, using my > current level of understanding of BTRFS, is that the filesystem in > question (16TB of historical snapshots) has so many subvolumes and > fragmentation that balance or device delete operations allocate so > much metadata space while processing the chunk (by allocating new > blocks for splitting filled metadata tree nodes) that the global > reserve is overrun. Corrections or advice on how to verify this theory > would be appreciated! (Or perhaps I should just use my patch to fix my > filesystem and move on with my life. Would be good to know when I can > wipe the disks containing the test case FS which reproduces the bug > and use them for something else.) The thing is global rsv should be a last resort allocation pool. E.g. if you have 16tb of snapshots but also has plenty of metadata space then you shouldn't be hitting global rsv. Have you tried with a recent kernel that includes the patches from the following series: https://patchwork.kernel.org/project/linux-btrfs/list/?series=17715 > > FWIW #2: I noticed that Josef Bacik proposed a change back in 2013 to > increase the global reserve size to 1G. The comments on the patch was > the reason I proposed to make it configurable rather than raising the > size again: https://patchwork.kernel.org/patch/2517071/ And that change hasn't really landed because it caused other problems. Current global rsv code is also capped at 512mb > > Thanks! > <snip>
