Hi, For your information I have updated my kernel to 5.4.8-1~bpo10+1 and btrfs-progs to 5.2.1-1~bpo10+1 (from buster backports). >From there I could mount rw with skip_balance, then cancel balance. After that, btrfs scrub and btrfs check displayed no error, except inconsistency in the space cache which I corrected by a clear_cache. By the way, do you advise to use space_cache on encrypted device ? Best regards, Pepie 34 Le 28/01/2020 à 18:32, Pepie 34 a écrit : > Le 28/01/2020 à 02:23, Qu Wenruo a écrit : >> On 2020/1/28 上午5:20, Pepie 34 wrote: >>> Dear BTRFS community, >>> >>> I've a raid 1 setup on two luks encrypted drives for 4 years that serves >>> me as btrbk backup target from an other computer. >>> There is a lot of ro snaptshots on it. >>> >>> I've mistakenly launched a balance on it which was extremely slow and >>> tried to cancelled it. >>> After two days of cancelling without results, I decided to power off the >>> computer. >>> >>> After the reboot, even with the skip_balance mount option, the mounting >>> is endless, no error in the kernel message and it never mounts. >> Is there anything like "relocating block group XXXX flags XXXX" ? > No but other messages see below > > >>> What I have done so far: >>> - mount the volume with the ro option (fast to mount, data OK). >>> - scrub in ro mode, no error found >> So data are all OK. >> Just need a way to cancel the balance. >> >>> - btrfs check >>> In the extent check there is plenty of errors like this : >>> => >>> ref mismatch on [9404816285696 32768] extent item 6, found 5 >>> >>> incorrect local backref count on 9404816285696 parent 5712684302336 >>> owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0 >>> backref disk bytenr does not match extent record, bytenr=9404816285696, >>> ref bytenr=0 >>> backpointer mismatch on [9404816285696 32768] >>> <= >> It could be caused by half-balanced fs. >> Need to re-check after we cancel the balance. >> >>> No errors in other checks, though checking "quota groups" is very slow. >> That's caused by the nature of qgroup. >> >>> What should I do ? btrfs check --repair ? >>> btrfs check --init-extent-tree ? >>> btrfs --clear-space-cache ? >> None of the options should affect data, but none of them are recommened. >> >> Since the problem is about the balance. >> >> Have you tried to mount the fs with RO,skip_balance, then remount it rw? > I have mount it ro,skip_balance then rw. > > It is now 12h it is trying to mount rw. > > I 've messages that tasks have taken more than 120 seconds in the kernel > log. > > Some samples: > > [43621.876315] INFO: task btrfs-transacti:21846 blocked for more than > 120 > seconds. > > [43621.876325] Not tainted 4.19.0-6-amd64 #1 Debian > 4.19.67-2+deb10u2 > > [43621.876327] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this > message. > > [43621.876331] btrfs-transacti D 0 21846 2 > 0x80000000 > > [43621.876334] Call > Trace: > > [43621.876345] ? > __schedule+0x2a2/0x870 > > [43621.876347] > schedule+0x28/0x80 > > [43621.876394] btrfs_commit_transaction+0x75f/0x880 > [btrfs] > > [43621.876399] ? > finish_wait+0x80/0x80 > > [43621.876419] transaction_kthread+0x147/0x180 > [btrfs] > > [43621.876440] ? btrfs_cleanup_transaction+0x530/0x530 > [btrfs] > > [43621.876443] > kthread+0x112/0x130 > > [43621.876445] ? > kthread_bind+0x30/0x30 > > [43621.876447] > ret_from_fork+0x22/0x40 > > > > [44346.867777] INFO: task mount:21595 blocked for more than 120 > seconds. > > [44346.867788] Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2 > [44346.867791] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [44346.867795] mount D 0 21595 21594 0x00000000 > [44346.867797] Call Trace: > [44346.867809] ? __schedule+0x2a2/0x870 > [44346.867812] ? __wake_up_common+0x7a/0x190 > [44346.867814] schedule+0x28/0x80 > [44346.867859] wait_current_trans+0xc3/0xf0 [btrfs] > [44346.867863] ? finish_wait+0x80/0x80 > [44346.867884] start_transaction+0x317/0x3e0 [btrfs] > [44346.867908] merge_reloc_root+0xf5/0x560 [btrfs] > [44346.867933] merge_reloc_roots+0xda/0x1f0 [btrfs] > [44346.867957] btrfs_recover_relocation+0x42d/0x490 [btrfs] > [44346.867978] open_ctree+0x1860/0x1bf0 [btrfs] > [44346.867995] btrfs_mount_root+0x682/0x740 [btrfs] > [44346.867999] ? cpumask_next+0x16/0x20 > [44346.868002] ? pcpu_alloc+0x321/0x640 > [44346.868005] mount_fs+0x3e/0x145 > [44346.868008] vfs_kern_mount.part.36+0x54/0x120 > [44346.868024] btrfs_mount+0x16f/0x860 [btrfs] > [44346.868027] ? path_lookupat.isra.48+0xa3/0x220 > [44346.868028] ? legitimize_path.isra.41+0x2d/0x60 > [44346.868030] ? cpumask_next+0x16/0x20 > [44346.868031] ? pcpu_alloc+0x321/0x640 > [44346.868032] ? mount_fs+0x3e/0x145 > [44346.868034] mount_fs+0x3e/0x145 > [44346.868035] vfs_kern_mount.part.36+0x54/0x120 > [44346.868037] do_mount+0x20e/0xcc0 > [44346.868039] ? _cond_resched+0x15/0x30 > [44346.868041] ? kmem_cache_alloc_trace+0x155/0x1d0 > [44346.868043] ksys_mount+0xb6/0xd0 > [44346.868044] __x64_sys_mount+0x21/0x30 > [44346.868047] do_syscall_64+0x53/0x110 > [44346.868050] entry_SYSCALL_64_after_hwframe+0x44/0xa9 > [44346.868052] RIP: 0033:0x7ff50cb41fea > [44346.868060] Code: Bad RIP value. > [44346.868061] RSP: 002b:00007ffd2257b2e8 EFLAGS: 00000246 ORIG_RAX: > 00000000000000a5 > [44346.868063] RAX: ffffffffffffffda RBX: 000055cc47409a40 RCX: > 00007ff50cb41fea > [44346.868064] RDX: 000055cc4740be00 RSI: 000055cc47409c50 RDI: > 000055cc4740aa50 > [44346.868065] RBP: 00007ff50ce961c4 R08: 000055cc47409c70 R09: > 000055cc474119e0 > [44346.868065] R10: 0000000000000000 R11: 0000000000000246 R12: > 0000000000000000 > [44346.868066] R13: 0000000000000000 R14: 000055cc4740aa50 R15: > 000055cc4740be00 > > Besides shutting down the computer, is there a proper way to stop the > mounting ? > > Best regards, > > Pepie 34 > > >> Thanks, >> Qu >> >>> Will the "init extent tree" option break btrfs receive with old snapshot >>> parents ? >>> >>> Best regards, >>> >>> Pepie34 >>>
