Re: raid10 array lost with single disk failure?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 2017-07-09 22:13, Adam Bahe wrote:
I have finished all of the above suggestions, ran a scrub, remounted,
rebooted, made sure the system didn't hang, and then kicked off
another balance on the entire pool. It completed rather quickly but
something still does not seem right.

Label: 'btrfs_pool1'  uuid: 04a7fa70-1572-47a2-a55c-7c99aef12603
         Total devices 18 FS bytes used 23.64TiB
         devid    1 size 1.82TiB used 1.82TiB path /dev/sdd
         devid    2 size 1.82TiB used 1.82TiB path /dev/sdf
         devid    3 size 3.64TiB used 3.07TiB path /dev/sdg
         devid    4 size 3.64TiB used 3.06TiB path /dev/sdk
         devid    5 size 1.82TiB used 1.82TiB path /dev/sdn
         devid    6 size 3.64TiB used 3.06TiB path /dev/sdo
         devid    7 size 1.82TiB used 1.82TiB path /dev/sds
         devid    8 size 1.82TiB used 1.82TiB path /dev/sdj
         devid    9 size 1.82TiB used 1.82TiB path /dev/sdi
         devid   10 size 1.82TiB used 1.82TiB path /dev/sdq
         devid   11 size 1.82TiB used 1.82TiB path /dev/sdr
         devid   12 size 1.82TiB used 1.82TiB path /dev/sde
         devid   13 size 1.82TiB used 1.82TiB path /dev/sdm
         devid   14 size 7.28TiB used 4.78TiB path /dev/sdh
         devid   15 size 7.28TiB used 4.99TiB path /dev/sdl
         devid   16 size 7.28TiB used 4.97TiB path /dev/sdp
         devid   17 size 7.28TiB used 4.99TiB path /dev/sdc
         devid   18 size 5.46TiB used 210.12GiB path /dev/sdb

/dev/sdb is the new disk, but btrfs only moved 210.12GB over to it.
Most disks in the array are >50% utilized or more. Is this normal?

Was this from a full balance, or just running a scrub to repair chunks?

You have three ways you can repair a BTRFS volume that's lost a device:

* The first, quickest, and most reliable is to use `btrfs device replace` to replace the failing/missing device. This will result in only reading data that needs to be read to go on the new device, so it completes quicker, but you will also need to resize the new device if you are going to a larger device, and can't replace the missing device with a smaller one.

* The second is to add the device to the array, then run a scrub on the whole array. This will spit out a bunch of errors from the chunks that need to be rebuilt, but will make sure everything is consistent. This isn't as fast as using `device replace`, but is still quicker than a full balance most of the time. In this particular case, I would expect behavior like what you're seeing above at least some of the time.

* The third, and slowest method, is to add the new device, then run a full balance. This will make sure data is evenly distributed proportionate to device size and will rebuild all the partial chunks. It will also take the longest, and put significantly more stress on the array than the other two options (it rewrites the entire array). If this is what you used, then you probably found a bug, because it should never result in what you're seeing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux