Re: btrfs balance start -musage=0 / eats drive space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Oops, forgot to mention that a full -mconvert,soft doesn't fix the
issue, either:

[255980.528000] BTRFS info (device dm-2): balance: start
-mconvert=raid1,soft -sconvert=raid1,soft
.....
[263341.880115] BTRFS info (device dm-2): 151 enospc errors during balance
[263341.880119] BTRFS info (device dm-2): balance: ended with status: -28

On 2/3/20 8:42 PM, Matt Corallo wrote:
> After giving up on my previous array (see "BUG_ON in btrfs check &
> fs/btrfs/extent-tree.c:3071"), I copied 11TB of crap (including tons of
> small files, and a few files up to eg 2TB in size) to a fresh array
> built on 5.4.16, with a bit of duperemove in between, partially on
> 5.4.16 and partially on 5.4.16 + "fs: allow deduplication of eof block
> into the end of the destination file" + "Btrfs: make deduplication with
> range including the last block work".
> 
> I then added a second drive to the array (to finish copying), kicked it
> back on again, and ran btrfs balance -mconvert=raid1 /path, waited for
> the first one or two block groups, then btrfs balance cancel /path (so
> that, I though, new metadata would be written with raid1, but old wont
> convert until I had more IOPS available). This seems to have completely
> borked my array's ability to balance. I trued running a few more balance
> -mconvert=raid1,softs but it hits ENOSPC after allocating metadata block
> groups and copying no data into them. After letting it run for a while,
> I now have a ton of metadata blocks that are unused and I cant seem to
> free them. bigraid2 started with around 500G of available unallocated
> space, and now is rather limited:
> 
> Overall:
>     Device size:		  18.19TiB
>     Device allocated:		  13.25TiB
>     Device unallocated:		   4.95TiB
>     Device missing:		     0.00B
>     Used:			  12.23TiB
>     Free (estimated):		   4.96TiB	(min: 2.49TiB)
>     Data ratio:			      1.00
>     Metadata ratio:		      1.80
>     Global reserve:		 512.00MiB	(used: 0.00B)
> 
> Data,single: Size:11.93TiB, Used:11.92TiB
>    /dev/mapper/bigraid2_crypt	   8.29TiB
>    /dev/mapper/bigraid42_crypt	   3.64TiB
> 
> Metadata,single: Size:151.00GiB, Used:149.04GiB
>    /dev/mapper/bigraid2_crypt	 151.00GiB
> 
> Metadata,RAID1: Size:596.00GiB, Used:86.13GiB
>    /dev/mapper/bigraid2_crypt	 596.00GiB
>    /dev/mapper/bigraid42_crypt	 596.00GiB
> 
> System,RAID1: Size:32.00MiB, Used:1.38MiB
>    /dev/mapper/bigraid2_crypt	  32.00MiB
>    /dev/mapper/bigraid42_crypt	  32.00MiB
> 
> Unallocated:
>    /dev/mapper/bigraid2_crypt	  72.97GiB
>    /dev/mapper/bigraid42_crypt	   4.87TiB
> 
> 
> 
> Running btrfs balance start -musage=0 /path very quickly eats drive
> space by allocating more metadata,RAID1 blocks and then hitting enospc:
> 
> [264158.311333] BTRFS info (device dm-2): balance: start -musage=0 -susage=0
> [264158.443053] BTRFS info (device dm-2): relocating block group
> 13821436493824 flags metadata|raid1
> [264159.103628] BTRFS info (device dm-2): relocating block group
> 13820362752000 flags metadata|raid1
> [264160.513106] BTRFS info (device dm-2): relocating block group
> 13818215268352 flags metadata|raid1
> [264161.475568] BTRFS info (device dm-2): relocating block group
> 13816067784704 flags metadata|raid1
> [264163.174316] BTRFS info (device dm-2): relocating block group
> 13814994042880 flags metadata|raid1
> [264164.326509] BTRFS info (device dm-2): relocating block group
> 13813920301056 flags metadata|raid1
> [264165.234746] BTRFS info (device dm-2): relocating block group
> 13812846559232 flags metadata|raid1
> [264166.237149] BTRFS info (device dm-2): relocating block group
> 13811772817408 flags metadata|raid1
> [264167.282200] BTRFS info (device dm-2): relocating block group
> 13810699075584 flags metadata|raid1
> [264168.207370] BTRFS info (device dm-2): relocating block group
> 13808551591936 flags metadata|raid1
> [264169.167734] BTRFS info (device dm-2): relocating block group
> 13807477850112 flags metadata|raid1
> [264170.166969] BTRFS info (device dm-2): relocating block group
> 13806404108288 flags metadata|raid1
> [264171.114659] BTRFS info (device dm-2): relocating block group
> 13805330366464 flags metadata|raid1
> [264172.985116] BTRFS info (device dm-2): relocating block group
> 13803182882816 flags metadata|raid1
> [264174.784420] BTRFS info (device dm-2): relocating block group
> 13801035399168 flags metadata|raid1
> [264175.967226] BTRFS info (device dm-2): relocating block group
> 13799961657344 flags metadata|raid1
> [264177.252213] BTRFS info (device dm-2): relocating block group
> 13798887915520 flags metadata|raid1
> [264178.499955] BTRFS info (device dm-2): relocating block group
> 13796740431872 flags metadata|raid1
> [264179.780731] BTRFS info (device dm-2): relocating block group
> 13795666690048 flags metadata|raid1
> [264180.885075] BTRFS info (device dm-2): relocating block group
> 13793519206400 flags metadata|raid1
> [264182.089885] BTRFS info (device dm-2): relocating block group
> 13790297980928 flags metadata|raid1
> [264183.358970] BTRFS info (device dm-2): relocating block group
> 13786003013632 flags metadata|raid1
> [264184.462920] BTRFS info (device dm-2): 137 enospc errors during balance
> [264184.462922] BTRFS info (device dm-2): balance: canceled
> 
> (the above allocated some 100G or so in Metadata,RAID1).
> 
> 
> Additionally, and I dont know if this was about this fs or the old,
> broken, mounted-ro,degraded fs, but I saw this while writing:
> 
> [245116.074467] ------------[ cut here ]------------
> [245116.074497] WARNING: CPU: 20 PID: 474 at fs/btrfs/inode.c:9378
> btrfs_destroy_inode+0x1c/0x288 [btrfs]
> [245116.074498] Modules linked in: xt_tcpudp(E) binfmt_misc(E) veth(E)
> xt_nat(E) wireguard(OE) essiv(E) authenc(E) ip6_udp_tunnel(E)
> udp_tunnel(E) nft_counter(E) nft_chain_nat(E) xt_MASQUERADE(E) nf_nat(E)
> nf_conntrack(E) nf_defrag_ipv6(E) nf_defrag_ipv4(E) nft_compat(E)
> nf_tables(E) nfnetlink(E) btrfs(E) zstd_compress(E) zstd_decompress(E)
> amdgpu(E) gpu_sched(E) snd_hda_codec_hdmi(E) ast(E) drm_vram_helper(E)
> snd_hda_intel(E) ttm(E) ofpart(E) snd_hda_codec(E) drm_kms_helper(E)
> powernv_flash(E) sg(E) mtd(E) snd_hda_core(E) uas(E) drm(E) snd_hwdep(E)
> snd_pcm(E) tg3(E) mpt3sas(E) snd_timer(E) ipmi_powernv(E)
> ipmi_devintf(E) drm_panel_orientation_quirks(E) libphy(E) snd(E)
> syscopyarea(E) ipmi_msghandler(E) sysfillrect(E) sysimgblt(E)
> fb_sys_fops(E) i2c_algo_bit(E) ptp(E) opal_prd(E) raid_class(E)
> pps_core(E) soundcore(E) scsi_transport_sas(E) at24(E) ip_tables(E)
> x_tables(E) autofs4(E) ext4(E) crc16(E) mbcache(E) jbd2(E) sd_mod(E)
> raid10(E) raid456(E) crc32c_generic(E) libcrc32c(E)
> [245116.074545]  async_raid6_recov(E) async_memcpy(E) async_pq(E)
> evdev(E) hid_generic(E) usbhid(E) hid(E) raid6_pq(E) async_xor(E) xor(E)
> async_tx(E) raid1(E) raid0(E) multipath(E) linear(E) md_mod(E)
> usb_storage(E) dm_crypt(E) dm_mod(E) algif_skcipher(E) af_alg(E)
> xhci_pci(E) xhci_hcd(E) vmx_crypto(E) nvme(E) usbcore(E) nvme_core(E)
> usb_common(E)
> [245116.074566] CPU: 20 PID: 474 Comm: kswapd0 Tainted: G        W  OE
>   5.4.0-3-powerpc64le #1 Debian 5.4.16-1
> [245116.074568] NIP:  c0080000078aec94 LR: c000000000447bb8 CTR:
> c0080000078aec78
> [245116.074569] REGS: c000000ff760f580 TRAP: 0700   Tainted: G        W
> OE      (5.4.0-3-powerpc64le Debian 5.4.16-1)
> [245116.074570] MSR:  9000000000029033 <SF,HV,EE,ME,IR,DR,RI,LE>  CR:
> 84048224  XER: 20040000
> [245116.074575] CFAR: c000000000447bb4 IRQMASK: 0
>                 GPR00: c000000000447bb8 c000000ff760f810
> c0080000079e6800 c000000e405ce530
>                 GPR04: 0000000000000018 0000000000000003
> 0000000000000009 c00c000025ae0400
>                 GPR08: 0000000ffed1d000 0000000000000001
> c000000c6292e830 c00800000796cb30
>                 GPR12: c0080000078aec78 c000000fffff2800
> 0000000000000001 0000000000000001
>                 GPR16: c000000001143be0 c000000001143bf0
> 0000000000000000 0000000000000260
>                 GPR20: 0000000000000000 0000000000000502
> 0000000000000000 0000000000015bdb
>                 GPR24: 0000000000000000 000000000004c20a
> 0000000000000000 c000000007fff800
>                 GPR28: c000000007fffc50 c000000e405ce530
> c0080000079f2b00 c000000e405ce530
> [245116.074600] NIP [c0080000078aec94] btrfs_destroy_inode+0x1c/0x288
> [btrfs]
> [245116.074605] LR [c000000000447bb8] destroy_inode+0x68/0xc0
> [245116.074606] Call Trace:
> [245116.074608] [c000000ff760f810] [c000000000447ba0]
> destroy_inode+0x50/0xc0 (unreliable)
> [245116.074612] [c000000ff760f840] [c000000000448290] dispose_list+0x80/0xb0
> [245116.074614] [c000000ff760f880] [c00000000044a3a0]
> prune_icache_sb+0x70/0xb0
> [245116.074618] [c000000ff760f8d0] [c00000000041e6e8]
> super_cache_scan+0x148/0x210
> [245116.074621] [c000000ff760f940] [c00000000032f7d8]
> do_shrink_slab+0x178/0x3b0
> [245116.074623] [c000000ff760fa10] [c00000000032fd0c]
> shrink_slab+0x2fc/0x4a0
> [245116.074626] [c000000ff760faf0] [c0000000003370bc]
> shrink_node+0x12c/0x600
> [245116.074629] [c000000ff760fbb0] [c000000000338a84]
> balance_pgdat+0x344/0x6b0
> [245116.074631] [c000000ff760fce0] [c000000000339070] kswapd+0x280/0x5c0
> [245116.074633] [c000000ff760fdb0] [c000000000155668] kthread+0x148/0x1a0
> [245116.074638] [c000000ff760fe20] [c00000000000bd54]
> ret_from_kernel_thread+0x5c/0x68
> [245116.074639] Instruction dump:
> [245116.074641] 7c0803a6 4e800020 60000000 00137b88 00000000 3c4c0013
> 38427b88 7c0802a6
> [245116.074646] 60000000 e9430138 312affff 7d295110 <0b090000> e94301c8
> 312affff 7d295110
> [245116.074651] ---[ end trace 39e70c580b2c71e0 ]---
> 



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux