On Wed, Feb 12, 2020 at 08:11:56PM +0800, Qu Wenruo wrote: > >>> > >>> This looks like an existing bug, IIRC Zygo reported it before. > >>> > >>> Btrfs balance just randomly failed at data reloc tree. > >>> > >>> Thus I don't believe it's related to Ethan's patches. > >> > >> Ok, than the patches make it more likely to happen, which could mean > >> that faster backref processing hits some race window. As there could be > >> more we should first fix the bug you say Zygo reported. > > > > I added a log to check if find_parent_nodes is ever called under > > test btrfs/125. It turns out that btrfs/125 doesn't pass through the > > function. What my patches do is all under find_parent_nodes. > > Balance goes through its own backref cache, thus it doesn't utilize the > path you're modifying. > > So don't worry your patches look pretty good. > > Furthermore, this csum mismatch is not related to backref walk, but the > data csum and the data in data reloc tree, which are all created by balance. > > So there is really no reason to block such good optimization. I don't mean to block the patchset but when I test patchsets from 5 people and tests start to fail I need to know what's the cause and if there's a fix in sight. So far the test failed 2 out of 2 (once the branch itself and then with for-next), I can do more rounds but at this point it's too reliable to reproduce so there is some connection. Sometimes it looks like I blame the messenger and complaining under patches that don't cause the bugs, but often I don't have anyting better than new warnings between 2 test rounds. Once we have more eyes on the problem we'll narrow it down and find the root cause.
