Re: while (1) in btrfs_relocate_block_group didn't end

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 2019-09-29 at 07:37 +0800, Qu Wenruo wrote:
> 
> On 2019/9/29 上午2:36, Cebtenzzre wrote:
> > On Mon, 2019-09-16 at 17:20 -0400, Cebtenzzre wrote:
> > > On Sat, 2019-09-14 at 17:36 -0400, Cebtenzzre wrote:
> > > > Hi,
> > > > 
> > > > I started a balance of one block group, and I saw this in dmesg:
> > > > 
> > > > BTRFS info (device sdi1): balance: start -dvrange=2236714319872..2236714319873
> > > > BTRFS info (device sdi1): relocating block group 2236714319872 flags data|raid0
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > BTRFS info (device sdi1): found 1 extents
> > > > 
> > > > [...]
> > > > 
> > > > I am using Arch Linux with kernel version 5.2.14-arch2, and I specified
> > > > "slub_debug=P,kmalloc-2k" in the kernel cmdline to detect and protect
> > > > against a use-after-free that I found when I had KASAN enabled. Would
> > > > that kernel parameter result in a silent retry if it hit the use-after-
> > > > free?
> > > 
> > > Please disregard the quoted message. This behavior does appear to be a
> > > result of using the slub_debug option instead of KASAN. It is not
> > > directly caused by BTRFS.
> > 
> > Actually, I just reproduced this behavior without slub_debug in the
> > cmdline, on Linux 5.3.0 with "[PATCH] btrfs: relocation: Fix KASAN
> > report about use-after-free due to dead reloc tree cleanup race" (
> > https://patchwork.kernel.org/patch/11153729/) applied.
> > 
> > So, this issue is still relevant and possible to trigger, though under
> > different conditions (different volume, kernel version, and cmdline).
> > 
> 
> That patch is not to solve the while loop problem, so we still need some
> extra info for this problem.
> 
> Is the problem always reproducible on that fs or still with some randomness?
> 
> And, can you still reproduce it with v5.1/v5.2?
> 
> Thanks,
> Qu
> 

I mentioned that patch because it was the only patch I had applied to my
kernel at the time. The "issue" I was referring to was the looping issue
that I reported in the first email.

I have only come across this behavior without slub_debug once or twice,
so I don't have enough of a sample size to say whether it can happen on
older kernels. It's caused by running a balance with *just* the right
amount of free space, such that the correct behavior is probably ENOSPC.

I might eventually dedicate a volume to reproducing this issue, and
bisect the kernel. But I need all of my disks to be usable right now.
-- 
Cebtenzzre <cebtenzzre@xxxxxxxxx>




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux