Re: Endless mount and backpointer mismatch

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

For your information I have updated my kernel to 5.4.8-1~bpo10+1 and
btrfs-progs to 5.2.1-1~bpo10+1 (from buster backports).

>From there I could mount rw with skip_balance, then cancel balance. 
After that, btrfs scrub and btrfs check displayed no error, except
inconsistency in the space cache which I corrected by a clear_cache.

By the way, do you advise to use space_cache on encrypted device ?

Best regards,

Pepie 34

Le 28/01/2020 à 18:32, Pepie 34 a écrit :
> Le 28/01/2020 à 02:23, Qu Wenruo a écrit :
>> On 2020/1/28 上午5:20, Pepie 34 wrote:
>>> Dear BTRFS community,
>>>
>>> I've a raid 1 setup on two luks encrypted drives for 4 years that serves
>>> me as btrbk backup target from an other computer.
>>> There is a lot of ro snaptshots on it.
>>>
>>> I've mistakenly launched a balance on it which was extremely slow and
>>> tried to cancelled it.
>>> After two days of cancelling without results, I decided to power off the
>>> computer.
>>>
>>> After the reboot, even with the skip_balance mount option, the mounting
>>> is endless, no error in the kernel message and it never mounts.
>> Is there anything like "relocating block group XXXX flags XXXX" ?
> No but other messages see below
>
>
>>> What I have done so far:
>>> - mount the volume with the ro option (fast to mount, data OK).
>>> - scrub in ro mode, no error found
>> So data are all OK.
>> Just need a way to cancel the balance.
>>
>>> - btrfs check
>>> In the extent check  there is plenty of errors like this :
>>> =>
>>> ref mismatch on [9404816285696 32768] extent item 6, found 5
>>>
>>> incorrect local backref count on 9404816285696 parent 5712684302336
>>> owner 0 offset 0 found 0 wanted 1 back 0x55f371ee1ad0
>>> backref disk bytenr does not match extent record, bytenr=9404816285696,
>>> ref bytenr=0
>>> backpointer mismatch on [9404816285696 32768]
>>> <=
>> It could be caused by half-balanced fs.
>> Need to re-check after we cancel the balance.
>>
>>> No errors in other checks, though checking "quota groups" is very slow.
>> That's caused by the nature of qgroup.
>>
>>> What should I do ? btrfs check --repair ?
>>> btrfs check --init-extent-tree ?
>>> btrfs --clear-space-cache ?
>> None of the options should affect data, but none of them are recommened.
>>
>> Since the problem is about the balance.
>>
>> Have you tried to mount the fs with RO,skip_balance, then remount it rw?
> I have mount it ro,skip_balance then rw.
>
> It is now 12h it is trying to mount rw.
>
> I 've messages that tasks have taken more than 120 seconds in the kernel
> log.
>
> Some samples:
>
> [43621.876315] INFO: task btrfs-transacti:21846 blocked for more than
> 120
> seconds.                                                                                                                                
>
> [43621.876325]       Not tainted 4.19.0-6-amd64 #1 Debian
> 4.19.67-2+deb10u2                                                                                                                                       
>
> [43621.876327] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this
> message.                                                                                                                          
>
> [43621.876331] btrfs-transacti D    0 21846      2
> 0x80000000                                                                                                                                                     
>
> [43621.876334] Call
> Trace:                                                                                                                                                                                        
>
> [43621.876345]  ?
> __schedule+0x2a2/0x870                                                                                                                                                                          
>
> [43621.876347] 
> schedule+0x28/0x80                                                                                                                                                                                
>
> [43621.876394]  btrfs_commit_transaction+0x75f/0x880
> [btrfs]                                                                                                                                                      
>
> [43621.876399]  ?
> finish_wait+0x80/0x80                                                                                                                                                                           
>
> [43621.876419]  transaction_kthread+0x147/0x180
> [btrfs]                                                                                                                                                           
>
> [43621.876440]  ? btrfs_cleanup_transaction+0x530/0x530
> [btrfs]                                                                                                                                                   
>
> [43621.876443] 
> kthread+0x112/0x130                                                                                                                                                                               
>
> [43621.876445]  ?
> kthread_bind+0x30/0x30                                                                                                                                                                          
>
> [43621.876447] 
> ret_from_fork+0x22/0x40                                                                                                                                                                                                              
>
>
>
> [44346.867777] INFO: task mount:21595 blocked for more than 120
> seconds.                                                                                                                                          
>
> [44346.867788]       Not tainted 4.19.0-6-amd64 #1 Debian 4.19.67-2+deb10u2
> [44346.867791] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs"
> disables this message.
> [44346.867795] mount           D    0 21595  21594 0x00000000
> [44346.867797] Call Trace:
> [44346.867809]  ? __schedule+0x2a2/0x870
> [44346.867812]  ? __wake_up_common+0x7a/0x190
> [44346.867814]  schedule+0x28/0x80
> [44346.867859]  wait_current_trans+0xc3/0xf0 [btrfs]
> [44346.867863]  ? finish_wait+0x80/0x80
> [44346.867884]  start_transaction+0x317/0x3e0 [btrfs]
> [44346.867908]  merge_reloc_root+0xf5/0x560 [btrfs]
> [44346.867933]  merge_reloc_roots+0xda/0x1f0 [btrfs]
> [44346.867957]  btrfs_recover_relocation+0x42d/0x490 [btrfs]
> [44346.867978]  open_ctree+0x1860/0x1bf0 [btrfs]
> [44346.867995]  btrfs_mount_root+0x682/0x740 [btrfs]
> [44346.867999]  ? cpumask_next+0x16/0x20
> [44346.868002]  ? pcpu_alloc+0x321/0x640
> [44346.868005]  mount_fs+0x3e/0x145
> [44346.868008]  vfs_kern_mount.part.36+0x54/0x120
> [44346.868024]  btrfs_mount+0x16f/0x860 [btrfs]
> [44346.868027]  ? path_lookupat.isra.48+0xa3/0x220
> [44346.868028]  ? legitimize_path.isra.41+0x2d/0x60
> [44346.868030]  ? cpumask_next+0x16/0x20
> [44346.868031]  ? pcpu_alloc+0x321/0x640
> [44346.868032]  ? mount_fs+0x3e/0x145
> [44346.868034]  mount_fs+0x3e/0x145
> [44346.868035]  vfs_kern_mount.part.36+0x54/0x120
> [44346.868037]  do_mount+0x20e/0xcc0
> [44346.868039]  ? _cond_resched+0x15/0x30
> [44346.868041]  ? kmem_cache_alloc_trace+0x155/0x1d0
> [44346.868043]  ksys_mount+0xb6/0xd0
> [44346.868044]  __x64_sys_mount+0x21/0x30
> [44346.868047]  do_syscall_64+0x53/0x110
> [44346.868050]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> [44346.868052] RIP: 0033:0x7ff50cb41fea
> [44346.868060] Code: Bad RIP value.
> [44346.868061] RSP: 002b:00007ffd2257b2e8 EFLAGS: 00000246 ORIG_RAX:
> 00000000000000a5
> [44346.868063] RAX: ffffffffffffffda RBX: 000055cc47409a40 RCX:
> 00007ff50cb41fea
> [44346.868064] RDX: 000055cc4740be00 RSI: 000055cc47409c50 RDI:
> 000055cc4740aa50
> [44346.868065] RBP: 00007ff50ce961c4 R08: 000055cc47409c70 R09:
> 000055cc474119e0
> [44346.868065] R10: 0000000000000000 R11: 0000000000000246 R12:
> 0000000000000000
> [44346.868066] R13: 0000000000000000 R14: 000055cc4740aa50 R15:
> 000055cc4740be00
>
> Besides shutting down the computer, is there a proper way to stop the
> mounting ?
>
> Best regards,
>
> Pepie 34
>
>
>> Thanks,
>> Qu
>>
>>> Will the "init extent tree" option break btrfs receive with old snapshot
>>> parents ?
>>>
>>> Best regards,
>>>
>>> Pepie34
>>>




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux