Re: very slow "btrfs dev delete" 3x6Tb, 7Tb of data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Jan 6, 2020 at 4:14 AM Leszek Dubiel <leszek@xxxxxxxxx> wrote:
>
>
>
> W dniu 02.01.2020 o 22:57, Chris Murphy pisze:
>
>  > but I would say that in retrospect it would have been better to NOT
>  > delete the device with a few bad sectors, and instead use `btrfs
>  > replace` to do a 1:1 replacement of that particular drive.
>
>
> Tested "replace" on ahother server:
>
>      root@zefir:~# btrfs replace start /dev/sde1 /dev/sdb3 /
>
> and speed was quite normal:
>
>      1.49 TiB * ( 1024 * 1024 MiB/TiB ) / ( 4.5 hours * 3600 sec/hour )
>          =     1.49 * ( 1024 * 1024 ) / ( 4.5 * 3600 )   =  96.44 MiB / sec
>
>
> Questions:
>
> 1. it is a little bit confusing that kerner reports sdc1 and sde1 on the
> same line: "BTRFS warning (device sdc1): i/o error ... on dev
> /dev/sde1"....

Can you provide the entire line? It's probably already confusing but
the ellipses makes it more confusing.


>
> # reduce slack
> root@zefir:~# btrfs fi resize 4:max /
> Resize '/' of '4:max'
> root@zefir:~# btrfs dev usage /
> ...
> /dev/sdb3, ID: 4
>     Device size:             1.80TiB
>     Device slack:            3.50KiB <<<<<<<<<<<<<<<<<<<<

Maybe the partition isn't aligned on a 4KiB boundary? *shrug*

But yeah one gotcha with 'btrfs replace' is that the replacement must
be equal to or bigger than the drive being replaced; and once
complete, the file system is not resized to fully utilize the
replacement drive. That's intentional because by default you may want
allocations to have the same balance as with the original device. If
you resize to max, Btrfs will favor allocations to the drive with the
most free space.


> Jan  5 13:52:09 zefir kernel: [1291932.446093] BTRFS warning (device
> sdc1): i/o error at logical 11658111352832 on dev /dev/sde1, physical
> 867246145536: metadata leaf (level 0) in tree 9109477097472


Ahh yeah I see what you mean. I think that's confusing also. The error
is on sde1. But I guess why sdc1 is reported first is probably to do
with the device the kernel considers mounted, there's not really a
good facility (or maybe it's in the newer VFS mount code, not sure)
for showing two devices mounted on a single mount point.



> [/dev/sda1].write_io_errs    10418
> [/dev/sda1].read_io_errs     227
> [/dev/sda1].flush_io_errs    117
> [/dev/sda1].corruption_errs  77
> [/dev/sda1].generation_errs  47

This isn't good either. I'd keep an eye on that. read errors can be
fixed up if there's a good copy, Btrfs will use the good copy and
overwrite the bad sector, *if* SCT ERC is lower duration than SCSI
command timer. But write and flush errors are bad. You need to find
out what that's about.



-- 
Chris Murphy



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux