Hi.
I have discovered case when replacement of missing devices causes
metadata corruption. Does anybody know anything about this?
I use 4.4.5 kernel with latest global spare patches.
If we have RAID6 (may be reproducible on RAID5 too) and try to replace
one missing drive by other and after this try to remove another drive
and replace it, plenty of errors are shown in the log:
[ 748.641766] BTRFS error (device sdf): failed to rebuild valid
logical 7366459392 for dev /dev/sde
[ 748.678069] BTRFS error (device sdf): failed to rebuild valid
logical 7381139456 for dev /dev/sde
[ 748.693559] BTRFS error (device sdf): failed to rebuild valid
logical 7290974208 for dev /dev/sde
[ 752.039100] BTRFS error (device sdf): bad tree block start
13048831955636601734 6919258112
[ 752.647869] BTRFS error (device sdf): bad tree block start
12819300352 6919290880
[ 752.658520] BTRFS error (device sdf): bad tree block start
31618367488 6919290880
[ 752.712633] BTRFS error (device sdf): bad tree block start
31618367488 6919290880
After device replacement finish, scrub shows uncorrectable errors.
Btrfs check complains about errors too:
root@test:~/# btrfs check -p /dev/sdc
Checking filesystem on /dev/sdc
UUID: 833fef31-5536-411c-8f58-53b527569fa5
checksum verify failed on 9359163392 found E4E3BDB6 wanted 00000000
checksum verify failed on 9359163392 found E4E3BDB6 wanted 00000000
checksum verify failed on 9359163392 found 4D1F4197 wanted DE0E50EC
bytenr mismatch, want=9359163392, have=9359228928
Errors found in extent allocation tree or chunk allocation
checking free space cache [.]
checking fs roots [.]
checking csums
checking root refs
found 1049788420 bytes used err is 0
total csum bytes: 1024000
total tree bytes: 1179648
total fs tree bytes: 16384
total extent tree bytes: 16384
btree space waste bytes: 124962
file data blocks allocated: 1049755648
referenced 1049755648
After first replacement metadata seems not spread across all devices:
Label: none uuid: 3db39446-6810-47bf-8732-d5a8793500f3
Total devices 4 FS bytes used 1002.00MiB
devid 1 size 8.00GiB used 1.28GiB path /dev/sdc
devid 2 size 8.00GiB used 1.28GiB path /dev/sdd
devid 3 size 8.00GiB used 1.28GiB path /dev/sdf
devid 4 size 8.00GiB used 1.25GiB path /dev/sdg
# btrfs device usage /mnt/
/dev/sdc, ID: 1
Device size: 8.00GiB
Data,RAID6: 1.00GiB
Metadata,RAID6: 256.00MiB
System,RAID6: 32.00MiB
Unallocated: 6.72GiB
/dev/sdd, ID: 2
Device size: 8.00GiB
Data,RAID6: 1.00GiB
Metadata,RAID6: 256.00MiB
System,RAID6: 32.00MiB
Unallocated: 6.72GiB
/dev/sdf, ID: 3
Device size: 8.00GiB
Data,RAID6: 1.00GiB
Metadata,RAID6: 256.00MiB
System,RAID6: 32.00MiB
Unallocated: 6.72GiB
/dev/sdg, ID: 4
Device size: 8.00GiB
Data,RAID6: 1.00GiB
Metadata,RAID6: 256.00MiB
Unallocated: 6.75GiB
Steps to reproduce:
1) Create and mount RAID6
2) remove drive belonging to RAID, try write and let kernel code close
the device
3) replace missing device by 'btrfs replace start' command
4) remove drive in another slot, try write, wait for closing of it
5) start replacing of missing drive -> ERRORS.
If full balance after step 3) was done, no errors appeared.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html