Re: Uncorrectable errors on RAID6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks, Qu, sad news... :-(
No, I also didn't defrag with older kernels. Maybe I did it a while
ago with 3.19.x, but there was a scrub afterwards and it showed no
error, so this shouldn't be the problem. The things described above
were all done with 4.0.3/4.0.4.

Balances and scrubs all stop at ~1.5 TiB of ~13.3TiB. Balance with an
error in the log, scrub just doesn't do anything according to dstat
without any error and still shows "running".

The errors/problems started during the first balance but maybe this
only showed them and is not the cause.

Here detailed debug infos to (maybe?) recreate the problem. This is
exactly what happened here over some time. As I can only tell when it
definitively has been clean (scrub at the beginning of May) an when it
definitively was broken (now, end of May), there may be some more
steps neccessary to reproduce, because several things happened in the
meantime:
- filesystem was created with "mkfs.btrfs -f -m raid6 -d raid6 -L
t-raid -O extref,raid56,skinny-metadata,no-holes" with 6
LUKS-encrypted HDDs on kernel 3.19
- mounted with options "defaults,compress-force=zlib,space_cache,autodefrag"
- copies all data onto it
- all data on the devices is now compressed with zlib
-> until now the filesystem is ok, scrub shows no errors
- now mount it with "defaults,compress-force=lzo,space_cache" instead
- use kernel 4.0.3/4.0.4
- create a r/o-snapshot
- defrag some data with "-clzo"
- have some (not much) I/O during the process
- this should approx. double the size of the defragged data because
your snapshot contains your data compressed with zlib and your volume
contains your data compressed with lzo
- delete the snapshot
- wait some time until the cleaning is complete, still some other I/O
during this
- this doesn't free as much data as the snapshot contained (?)
-> is this ok? Maybe here the problem already existed/started
- defrag the rest of all data on the devices with "-clzo", still some
other I/O during this
- now start a balance of the whole array
-> errors will spam the log and it's broken.

I hope, it is possible to reproduce the errors and find out exactly
when this happens. I'll do the same steps again, too, but maybe there
is someone else who could try it as well? With some small loop-devices
just for testing this shouldn't take too long even if it sounds like
that ;-)

Back to my actual data: Are there any tips on how to recover? Mount
with "recover", copy over and see the log, which files seem to be
broken? Or some (dangerous) tricks on how to repair this broken file
system?
I do have a full backup, but it's very slow and may take weeks
(months?), if I have to recover everything.

Regards,
Tobias



2015-05-29 2:36 GMT+02:00 Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>:
>
>
> -------- Original Message  --------
> Subject: Re: Uncorrectable errors on RAID6
> From: Tobias Holst <tobby@xxxxxxxx>
> To: Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>
> Date: 2015年05月28日 21:13
>
>> Ah it's already done. You can find the error-log over here:
>> https://paste.ee/p/sxCKF
>>
>> In short there are several of these:
>> bytenr mismatch, want=6318462353408, have=56676169344768
>> checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
>> checksum verify failed on 8955306033152 found 14EED112 wanted 6F1EB890
>> checksum verify failed on 8955306033152 found 5B5F717A wanted C44CA54E
>> checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A
>> checksum verify failed on 8955306033152 found CF62F201 wanted E3B7021A
>>
>> and these:
>> ref mismatch on [13431504896 16384] extent item 1, found 0
>> Backref 13431504896 root 7 not referenced back 0x1202acc0
>> Incorrect global backref count on 13431504896 found 1 wanted 0
>> backpointer mismatch on [13431504896 16384]
>> owner ref check failed [13431504896 16384]
>>
>> and these:
>> ref mismatch on [1951739412480 524288] extent item 0, found 1
>> Backref 1951739412480 root 5 owner 27852 offset 644349952 num_refs 0
>> not found in extent tree
>> Incorrect local backref count on 1951739412480 root 5 owner 27852
>> offset 644349952 found 1 wanted 0 back 0x1a92aa20
>> backpointer mismatch on [1951739412480 524288]
>>
>> Any ideas? :)
>>
> The metadata is really corrupted...
>
> I'd recommend to salvage your data as soon as possible.
>
> For the reason, as you didn't run replace, it should at least not the
> bug spotted by Zhao Lei.
>
> BTW, did you run defrag on older kernels?
> IIRC, old kernel has bug with snapshot aware defrag, so it's later
> disabled in newer kernel.
> Not sure if it's related.
>
> Balance may be related but I'm not familiar with balance with RAID5/6.
> So hard to say.
>
> Sorry for unable to provide much help.
>
> But if you have enough time to find a stable method to reproduce the bug,
> best try it on loop device, it would definitely help us to debug.
>
> Thanks,
> Qu
>
>
>> Regards
>> Tobias
>>
>>
>> 2015-05-28 14:57 GMT+02:00 Tobias Holst <tobby@xxxxxxxx>:
>>>
>>> Hi Qu,
>>>
>>> no, I didn't run a replace. But I ran a defrag with "-clzo" on all
>>> files while there has been slightly I/O on the devices. Don't know if
>>> this could cause corruptions, too?
>>>
>>> Later on I deleted a r/o-snapshot which should free a big amount of
>>> storage space. It didn't free as much as it should so after a few days
>>> I started a balance to free the space. During the balance the first
>>> checksum errors happened and the whole balance process crashed:
>>>
>>> [19174.342882] BTRFS: dm-5 checksum verify failed on 6318462353408
>>> wanted 25D94CD6 found 8BA427D4 level 1
>>> [19174.365473] BTRFS: dm-5 checksum verify failed on 6318462353408
>>> wanted 25D94CD6 found 8BA427D4 level 1
>>> [19174.365651] BTRFS: dm-5 checksum verify failed on 6318462353408
>>> wanted 25D94CD6 found 8BA427D4 level 1
>>> [19174.366168] BTRFS: dm-5 checksum verify failed on 6318462353408
>>> wanted 25D94CD6 found 8BA427D4 level 1
>>> [19174.366250] BTRFS: dm-5 checksum verify failed on 6318462353408
>>> wanted 25D94CD6 found 8BA427D4 level 1
>>> [19174.366392] BTRFS: dm-5 checksum verify failed on 6318462353408
>>> wanted 25D94CD6 found 8BA427D4 level 1
>>> [19174.367313] ------------[ cut here ]------------
>>> [19174.367340] kernel BUG at
>>> /home/kernel/COD/linux/fs/btrfs/relocation.c:242!
>>> [19174.367384] invalid opcode: 0000 [#1] SMP
>>> [19174.367418] Modules linked in: iosf_mbi kvm_intel kvm
>>> crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel
>>> aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper
>>> cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp
>>> parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt
>>> ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy
>>> psmouse pata_acpi
>>> [19174.367656] CPU: 1 PID: 4960 Comm: btrfs Not tainted
>>> 4.0.4-040004-generic #201505171336
>>> [19174.367703] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>> BIOS Bochs 01/01/2011
>>> [19174.367752] task: ffff8804274e8000 ti: ffff880367b50000 task.ti:
>>> ffff880367b50000
>>> [19174.367797] RIP: 0010:[<ffffffffc05ec4ba>]  [<ffffffffc05ec4ba>]
>>> backref_cache_cleanup+0xea/0x100 [btrfs]
>>> [19174.367867] RSP: 0018:ffff880367b53bd8  EFLAGS: 00010202
>>> [19174.367905] RAX: ffff88008250d8f8 RBX: ffff88008250d820 RCX:
>>> 0000000180200001
>>> [19174.367948] RDX: ffff88008250d8d8 RSI: ffff88008250d8e8 RDI:
>>> 0000000040000000
>>> [19174.367992] RBP: ffff880367b53bf8 R08: ffff880418b77780 R09:
>>> 0000000180200001
>>> [19174.368037] R10: ffffffffc05ec1d9 R11: 0000000000018bf8 R12:
>>> 0000000000000001
>>> [19174.368081] R13: ffff88008250d8e8 R14: 00000000fffffffb R15:
>>> ffff880367b53c28
>>> [19174.368125] FS:  00007f7fd6831c80(0000) GS:ffff88043fc40000(0000)
>>> knlGS:0000000000000000
>>> [19174.368172] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [19174.368210] CR2: 00007f65f7564770 CR3: 00000003ac92f000 CR4:
>>> 00000000001407e0
>>> [19174.368257] Stack:
>>> [19174.368279]  00000000fffffffb ffff88008250d800 ffff88042b3d46e0
>>> ffff88006845f990
>>> [19174.368327]  ffff880367b53c78 ffffffffc05f25eb ffff880367b53c78
>>> 0000000000000002
>>> [19174.368376]  00ff880429e4c670 a9000010d8fb7e00 0000000000000000
>>> 0000000000000000
>>> [19174.368424] Call Trace:
>>> [19174.368459]  [<ffffffffc05f25eb>] relocate_block_group+0x2cb/0x510
>>> [btrfs]
>>> [19174.368509]  [<ffffffffc05f29e0>]
>>> btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
>>> [19174.368562]  [<ffffffffc05c6eab>]
>>> btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs]
>>> [19174.368615]  [<ffffffffc05c82e8>] __btrfs_balance+0x348/0x460 [btrfs]
>>> [19174.368663]  [<ffffffffc05c87b5>] btrfs_balance+0x3b5/0x5d0 [btrfs]
>>> [19174.368710]  [<ffffffffc05d5cac>] btrfs_ioctl_balance+0x1cc/0x530
>>> [btrfs]
>>> [19174.368756]  [<ffffffff811b52e0>] ? handle_mm_fault+0xb0/0x160
>>> [19174.368802]  [<ffffffffc05d7c7e>] btrfs_ioctl+0x69e/0xb20 [btrfs]
>>> [19174.368845]  [<ffffffff8120f5b5>] do_vfs_ioctl+0x75/0x320
>>> [19174.368882]  [<ffffffff8120f8f1>] SyS_ioctl+0x91/0xb0
>>> [19174.368923]  [<ffffffff817f098d>] system_call_fastpath+0x16/0x1b
>>> [19174.368962] Code: 3b 00 75 29 44 8b a3 00 01 00 00 45 85 e4 75 1b
>>> 44 8b 9b 04 01 00 00 45 85 db 75 0d 48 83 c4 08 5b 41 5c 41 5d 5d c3
>>> 0f 0b 0f 0b <0f> 0b 0f 0b 0f 0b 0f 0b 66 66 66 66 66 2e 0f 1f 84 00 00
>>> 00 00
>>> [19174.369133] RIP  [<ffffffffc05ec4ba>]
>>> backref_cache_cleanup+0xea/0x100 [btrfs]
>>> [19174.369186]  RSP <ffff880367b53bd8>
>>> [19174.369827] ------------[ cut here ]------------
>>> [19174.369827] kernel BUG at
>>> /home/kernel/COD/linux/arch/x86/mm/pageattr.c:216!
>>> [19174.369827] invalid opcode: 0000 [#2] SMP
>>> [19174.369827] Modules linked in: iosf_mbi kvm_intel kvm
>>> crct10dif_pclmul ppdev dm_crypt crc32_pclmul ghash_clmulni_intel
>>> aesni_intel aes_x86_64 lrw gf128mul glue_helper parport_pc ablk_helper
>>> cryptd mac_hid 8250_fintek virtio_rng serio_raw i2c_piix4 pvpanic lp
>>> parport btrfs xor raid6_pq cirrus syscopyarea sysfillrect sysimgblt
>>> ttm mpt2sas drm_kms_helper raid_class scsi_transport_sas drm floppy
>>> psmouse pata_acpi
>>> [19174.369827] CPU: 1 PID: 4960 Comm: btrfs Not tainted
>>> 4.0.4-040004-generic #201505171336
>>> [19174.369827] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>>> BIOS Bochs 01/01/2011
>>> [19174.369827] task: ffff8804274e8000 ti: ffff880367b50000 task.ti:
>>> ffff880367b50000
>>> [19174.369827] RIP: 0010:[<ffffffff8106875f>]  [<ffffffff8106875f>]
>>> cpa_flush_array+0x10f/0x120
>>> [19174.369827] RSP: 0018:ffff880367b52cf8  EFLAGS: 00010046
>>> [19174.369827] RAX: 0000000000000092 RBX: 0000000000000000 RCX:
>>> 0000000000000005
>>> [19174.369827] RDX: 0000000000000001 RSI: 0000000000000200 RDI:
>>> 0000000000000000
>>> [19174.369827] RBP: ffff880367b52d48 R08: ffff880411ef2000 R09:
>>> 0000000000000001
>>> [19174.369827] R10: 0000000000000004 R11: ffffffff81adb6be R12:
>>> 0000000000000200
>>> [19174.369827] R13: 0000000000000001 R14: 0000000000000005 R15:
>>> 0000000000000000
>>> [19174.369827] FS:  00007f7fd6831c80(0000) GS:ffff88043fc40000(0000)
>>> knlGS:0000000000000000
>>> [19174.369827] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>>> [19174.369827] CR2: 00007f65f7564770 CR3: 00000003ac92f000 CR4:
>>> 00000000001407e0
>>> [19174.369827] Stack:
>>> [19174.369827]  0000000000000001 ffff880411ef2000 0000000000000001
>>> 0000000000000001
>>> [19174.369827]  ffff880367b52d48 0000000000000000 0000000000000200
>>> 0000000000000000
>>> [19174.369827]  0000000000000004 0000000000000000 ffff880367b52de8
>>> ffffffff8106979c
>>> [19174.369827] Call Trace:
>>> [19174.369827]  [<ffffffff8106979c>] change_page_attr_set_clr+0x23c/0x2c0
>>> [19174.369827]  [<ffffffff810699b0>] _set_pages_array+0xf0/0x140
>>> [19174.369827]  [<ffffffff81069a13>] set_pages_array_wc+0x13/0x20
>>> [19174.369827]  [<ffffffffc052d926>] ttm_set_pages_caching+0x46/0x80
>>> [ttm]
>>> [19174.369827]  [<ffffffffc052da24>]
>>> ttm_alloc_new_pages.isra.6+0xc4/0x1a0 [ttm]
>>> [19174.369827]  [<ffffffffc052dc76>]
>>> ttm_page_pool_fill_locked.isra.7.constprop.12+0x96/0x140 [ttm]
>>> [19174.369827]  [<ffffffffc052dd5a>]
>>> ttm_page_pool_get_pages.isra.8.constprop.10+0x3a/0xe0 [ttm]
>>> [19174.369827]  [<ffffffffc052dea0>]
>>> ttm_get_pages.constprop.11+0xa0/0x1f0 [ttm]
>>> [19174.369827]  [<ffffffffc052e07c>] ttm_pool_populate+0x8c/0xf0 [ttm]
>>> [19174.369827]  [<ffffffffc052a0f3>] ? ttm_mem_reg_ioremap+0x63/0xf0
>>> [ttm]
>>> [19174.369827]  [<ffffffffc056146e>] cirrus_ttm_tt_populate+0xe/0x10
>>> [cirrus]
>>> [19174.369827]  [<ffffffffc052a7ea>] ttm_bo_move_memcpy+0x5ea/0x650 [ttm]
>>> [19174.369827]  [<ffffffffc05266ac>] ? ttm_tt_init+0x8c/0xb0 [ttm]
>>> [19174.369827]  [<ffffffff811c3aee>] ? __vmalloc_node+0x3e/0x40
>>> [19174.369827]  [<ffffffffc0561418>] cirrus_bo_move+0x18/0x20 [cirrus]
>>> [19174.369827]  [<ffffffffc0527f5f>] ttm_bo_handle_move_mem+0x27f/0x6f0
>>> [ttm]
>>> [19174.369827]  [<ffffffffc0528f7c>] ttm_bo_move_buffer+0xdc/0xf0 [ttm]
>>> [19174.369827]  [<ffffffffc0529023>] ttm_bo_validate+0x93/0xb0 [ttm]
>>> [19174.369827]  [<ffffffffc0561c3f>] cirrus_bo_push_sysram+0x8f/0xe0
>>> [cirrus]
>>> [19174.369827]  [<ffffffffc055feb3>]
>>> cirrus_crtc_do_set_base.isra.9.constprop.10+0x83/0x2b0 [cirrus]
>>> [19174.369827]  [<ffffffff811df534>] ? kmem_cache_alloc_trace+0x1c4/0x210
>>> [19174.369827]  [<ffffffffc056056f>] cirrus_crtc_mode_set+0x48f/0x4f0
>>> [cirrus]
>>> [19174.369827]  [<ffffffffc04c29de>]
>>> drm_crtc_helper_set_mode+0x35e/0x5c0 [drm_kms_helper]
>>> [19174.369827]  [<ffffffffc04c35f2>]
>>> drm_crtc_helper_set_config+0x6d2/0xad0 [drm_kms_helper]
>>> [19174.369827]  [<ffffffffc0560f9a>] ? cirrus_dirty_update+0xca/0x320
>>> [cirrus]
>>> [19174.369827]  [<ffffffff811df534>] ? kmem_cache_alloc_trace+0x1c4/0x210
>>> [19174.369827]  [<ffffffffc0406026>]
>>> drm_mode_set_config_internal+0x66/0x110 [drm]
>>> [19174.369827]  [<ffffffffc04ceee2>]
>>> drm_fb_helper_pan_display+0xa2/0xf0 [drm_kms_helper]
>>> [19174.369827]  [<ffffffff814382cd>] fb_pan_display+0xbd/0x170
>>> [19174.369827]  [<ffffffff81432629>] bit_update_start+0x29/0x60
>>> [19174.369827]  [<ffffffff81431ee2>] fbcon_switch+0x3b2/0x560
>>> [19174.369827]  [<ffffffff814c22f9>] redraw_screen+0x179/0x220
>>> [19174.369827]  [<ffffffff8143024a>] fbcon_blank+0x21a/0x2d0
>>> [19174.369827]  [<ffffffff810d0aa2>] ? wake_up_klogd+0x32/0x40
>>> [19174.369827]  [<ffffffff810d0cd8>] ? console_unlock.part.19+0x228/0x2a0
>>> [19174.369827]  [<ffffffff810e343c>] ? internal_add_timer+0x6c/0x90
>>> [19174.369827]  [<ffffffff810e58d9>] ? mod_timer+0xf9/0x200
>>> [19174.369827]  [<ffffffff814c2de0>] do_unblank_screen.part.22+0xa0/0x180
>>> [19174.369827]  [<ffffffff814c2f0c>] do_unblank_screen+0x4c/0x80
>>> [19174.369827]  [<ffffffffc05ec4ba>] ? backref_cache_cleanup+0xea/0x100
>>> [btrfs]
>>> [19174.369827]  [<ffffffff814c2f50>] unblank_screen+0x10/0x20
>>> [19174.369827]  [<ffffffff813c3ccd>] bust_spinlocks+0x1d/0x40
>>> [19174.369827]  [<ffffffff81019bd3>] oops_end+0x43/0x120
>>> [19174.369827]  [<ffffffff8101a2f8>] die+0x58/0x90
>>> [19174.369827]  [<ffffffff8101642d>] do_trap+0xcd/0x160
>>> [19174.369827]  [<ffffffff810167e6>] do_error_trap+0xe6/0x170
>>> [19174.369827]  [<ffffffffc05ec4ba>] ? backref_cache_cleanup+0xea/0x100
>>> [btrfs]
>>> [19174.369827]  [<ffffffff817dce0f>] ? __slab_free+0xee/0x234
>>> [19174.369827]  [<ffffffff817dce0f>] ? __slab_free+0xee/0x234
>>> [19174.369827]  [<ffffffffc05baf0e>] ? clear_state_bit+0xae/0x170 [btrfs]
>>> [19174.369827]  [<ffffffffc05ba67a>] ? free_extent_state+0x6a/0xd0
>>> [btrfs]
>>> [19174.369827]  [<ffffffff810172e0>] do_invalid_op+0x20/0x30
>>> [19174.369827]  [<ffffffff817f24ee>] invalid_op+0x1e/0x30
>>> [19174.369827]  [<ffffffffc05ec1d9>] ?
>>> free_backref_node.isra.36+0x19/0x20 [btrfs]
>>> [19174.369827]  [<ffffffffc05ec4ba>] ? backref_cache_cleanup+0xea/0x100
>>> [btrfs]
>>> [19174.369827]  [<ffffffffc05ec43c>] ? backref_cache_cleanup+0x6c/0x100
>>> [btrfs]
>>> [19174.369827]  [<ffffffffc05f25eb>] relocate_block_group+0x2cb/0x510
>>> [btrfs]
>>> [19174.369827]  [<ffffffffc05f29e0>]
>>> btrfs_relocate_block_group+0x1b0/0x2d0 [btrfs]
>>> [19174.369827]  [<ffffffffc05c6eab>]
>>> btrfs_relocate_chunk.isra.75+0x4b/0xd0 [btrfs]
>>> [19174.369827]  [<ffffffffc05c82e8>] __btrfs_balance+0x348/0x460 [btrfs]
>>> [19174.369827]  [<ffffffffc05c87b5>] btrfs_balance+0x3b5/0x5d0 [btrfs]
>>> [19174.369827]  [<ffffffffc05d5cac>] btrfs_ioctl_balance+0x1cc/0x530
>>> [btrfs]
>>> [19174.369827]  [<ffffffff811b52e0>] ? handle_mm_fault+0xb0/0x160
>>> [19174.369827]  [<ffffffffc05d7c7e>] btrfs_ioctl+0x69e/0xb20 [btrfs]
>>> [19174.369827]  [<ffffffff8120f5b5>] do_vfs_ioctl+0x75/0x320
>>> [19174.369827]  [<ffffffff8120f8f1>] SyS_ioctl+0x91/0xb0
>>> [19174.369827]  [<ffffffff817f098d>] system_call_fastpath+0x16/0x1b
>>> [19174.369827] Code: 4e 8b 2c 23 eb cd 66 0f 1f 44 00 00 48 83 c4 28
>>> 5b 41 5c 41 5d 41 5e 41 5f 5d c3 90 be 00 10 00 00 4c 89 ef e8 a3 ee
>>> ff ff eb c7 <0f> 0b 66 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f
>>> 44 00
>>> [19174.369827] RIP  [<ffffffff8106875f>] cpa_flush_array+0x10f/0x120
>>> [19174.369827]  RSP <ffff880367b52cf8>
>>> [19174.369827] ---[ end trace 60adc437bd944044 ]---
>>>
>>> After a reboot and a remount it always tried to resume the balance and
>>> and then crashed again, so I had to be quick to do a "btrfs balance
>>> cancel". Then I started the scrub and got these uncorrectable errors I
>>> mentioned in the first mail.
>>>
>>> I just unmounted it and started a btrfsck. Will post the output when it's
>>> done.
>>> It's already showing me several of these:
>>>
>>> checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587
>>> checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587
>>> checksum verify failed on 18523667709952 found 5EAB6BFE wanted BA48D648
>>> checksum verify failed on 18523667709952 found 8E19F60E wanted E3A34D18
>>> checksum verify failed on 18523667709952 found C240FB11 wanted 1ED6A587
>>> bytenr mismatch, want=18523667709952, have=10838194617263884761
>>>
>>>
>>> Thanks,
>>> Tobias
>>>
>>>
>>>
>>> 2015-05-28 4:49 GMT+02:00 Qu Wenruo <quwenruo@xxxxxxxxxxxxxx>:
>>>>
>>>>
>>>>
>>>> -------- Original Message  --------
>>>> Subject: Uncorrectable errors on RAID6
>>>> From: Tobias Holst <tobby@xxxxxxxx>
>>>> To: linux-btrfs@xxxxxxxxxxxxxxx <linux-btrfs@xxxxxxxxxxxxxxx>
>>>> Date: 2015年05月28日 10:18
>>>>
>>>>> Hi
>>>>>
>>>>> I am doing a scrub on my 6-drive btrfs RAID6. Last time it found zero
>>>>> errors, but now I am getting this in my log:
>>>>>
>>>>> [ 6610.888020] BTRFS: checksum error at logical 478232346624 on dev
>>>>> /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
>>>>> [ 6610.888025] BTRFS: checksum error at logical 478232346624 on dev
>>>>> /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
>>>>> [ 6610.888029] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt
>>>>> 1,
>>>>> gen 0
>>>>> [ 6611.271334] BTRFS: unable to fixup (regular) error at logical
>>>>> 478232346624 on dev /dev/dm-2
>>>>> [ 6611.831370] BTRFS: checksum error at logical 478232346624 on dev
>>>>> /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
>>>>> [ 6611.831373] BTRFS: checksum error at logical 478232346624 on dev
>>>>> /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
>>>>> [ 6611.831375] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt
>>>>> 2,
>>>>> gen 0
>>>>> [ 6612.396402] BTRFS: unable to fixup (regular) error at logical
>>>>> 478232346624 on dev /dev/dm-2
>>>>> [ 6904.027456] BTRFS: checksum error at logical 478232346624 on dev
>>>>> /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
>>>>> [ 6904.027460] BTRFS: checksum error at logical 478232346624 on dev
>>>>> /dev/dm-2, sector 231373760: metadata leaf (level 0) in tree 2
>>>>> [ 6904.027463] BTRFS: bdev /dev/dm-2 errs: wr 0, rd 0, flush 0, corrupt
>>>>> 3,
>>>>> gen 0
>>>>>
>>>>> Looks like it is always the same sector.
>>>>>
>>>>> "btrfs balance status" shows me:
>>>>> scrub status for a34ce68b-bb9f-49f0-91fe-21a924ef11ae
>>>>>           scrub started at Thu May 28 02:25:31 2015, running for 6759
>>>>> seconds
>>>>>           total bytes scrubbed: 448.87GiB with 14 errors
>>>>>           error details: read=8 csum=6
>>>>>           corrected errors: 3, uncorrectable errors: 11, unverified
>>>>> errors:
>>>>> 0
>>>>>
>>>>> What does it mean and why are these erros uncorrectable even on a
>>>>> RAID6?
>>>>> Can I find out, which files are affected?
>>>>
>>>>
>>>> If it's OK for you to put the fs offline,
>>>> btrfsck is the best method to check what happens, although it may take a
>>>> long time.
>>>>
>>>> There is a known bug that replace can cause checksum error, found by
>>>> Zhao
>>>> Lei.
>>>> So did you run replace while there is still some other disk I/O happens?
>>>>
>>>> Thanks,
>>>> Qu
>>>>>
>>>>>
>>>>>
>>>>> system: Ubuntu 14.04.2
>>>>> kernel version 4.0.4
>>>>> btrfs-tools version: 4.0
>>>>>
>>>>> Regards
>>>>> Tobias
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs"
>>>>> in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux