On 26.03.19 г. 17:09 ч., Zygo Blaxell wrote: > On Tue, Mar 26, 2019 at 10:42:31AM +0200, Nikolay Borisov wrote: >> >> >> On 26.03.19 г. 6:30 ч., Zygo Blaxell wrote: >>> On Mon, Mar 25, 2019 at 10:50:28PM -0400, Zygo Blaxell wrote: >>>> Running balance, rsync, and dedupe, I get kernel warnings every few >>>> minutes on 5.0.4. No warnings on 5.0.3 under similar conditions. >>>> >>>> Mount options are: flushoncommit,space_cache=v2,compress=zstd. >>>> >>>> There are two different stacks on the warnings. This one comes from >>>> btrfs balance: >>> >>> [snip] >>> >>> Possibly unrelated, but I'm also repeatably getting this in 5.0.4 and >>> not 5.0.3, after about 5 hours of uptime. Different processes, same >>> kernel stack: >>> >>> [Mon Mar 25 23:35:17 2019] kworker/u8:4: page allocation failure: order:0, mode:0x404000(GFP_NOWAIT|__GFP_COMP), nodemask=(null),cpuset=/,mems_allowed=0 >>> [Mon Mar 25 23:35:17 2019] CPU: 2 PID: 29518 Comm: kworker/u8:4 Tainted: G W 5.0.4-zb64-303ce93b05c9+ #1 >> >> What commits does this kernel include because it doesn't seem to be a >> pristine upstream 5.0.4 ? Also what you are seeing below is definitely a >> bug in MM. The question is whether it's due to your doing faulty >> backports in the kernel or it's due to something that got automatically >> backported to 5.0.4 > > That was the first thing I thought of, so I reverted to vanilla 5.0.4, > repeated the test, and obtained the same result. > > You may have a point about non-btrfs patches in 5.0.4, though. > I previously tested 5.0.3 with most of the 5.0.4 fs/btrfs commits > already included by cherry-pick: > > 1098803b8cb7 Btrfs: fix deadlock between clone/dedupe and rename > 3486142a68e3 Btrfs: fix corruption reading shared and compressed extents after hole punching > fb9c36acfab1 btrfs: scrub: fix circular locking dependency warning > 9d7b327affb8 Btrfs: setup a nofs context for memory allocation at __btrfs_set_acl > 80dcd07c27df Btrfs: setup a nofs context for memory allocation at btrfs_create_tree() > > The commits that are in 5.0.4 but not in my last 5.0.3 test run are: > > ebbb48419e8a btrfs: init csum_list before possible free > 88e610ae4c3a btrfs: ensure that a DUP or RAID1 block group has exactly two stripes > 9c58f2ada4fa btrfs: drop the lock on error in btrfs_dev_replace_cancel > > and I don't see how those commits could lead to the observed changes > in behavior. I didn't include them for 5.0.3 because my test scenario > doesn't execute the code they touch. So the problem might be outside > of btrfs completely. I think it might very well be outside of btrfs because you are seeing an order 0 failure when you have plenty of order 0 free pages. That's definitely something you might want to report to mm. >
