Re: Extremely slow device removals

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, May 2, 2020 at 12:42 AM Zygo Blaxell
<ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote:

> If you use btrfs replace to move data between drives then you get all
> the advantages you describe.  Don't do 'device remove' if you can possibly
> avoid it.

But I had to use replace to do what I originally wanted to do: replace
four 6TB drives with two 16TB drives.  I could replace two but I'd
still have to remove two more. I may give up on that latter part for
now, but my original hope was to move everything to a smaller and
especially quieter box than the 10-year-old 4U server I have now
that's banished to the garage because of the noise. (Working on its
console in single-user is much less pleasant than retiring to the
house and using my laptop.) I also wanted to retire all four 6 TB
drives because they have over 35K hours (four years) of continuous run
time. They keep passing their SMART checks but I didn't want to keep
pushing my luck.

> If there's data corruption on one disk, btrfs can detect it and replace
> the lost data from the good copy.

That's a very good point I should have remembered. FS-agnostic RAID
depends on drive-level error detection, and being an early TCP/IP guy
I have always been a fan of end-to-end checks. That said, I can't
remember EVER having one of my drives silently corrupt data. When one
failed, I knew it. (Boy, did I know it.)  I can detect silent
corruption even in my ext4 or xfs file systems because I've been
experimenting for years with stashing SHA file hashes in an extended
attribute and periodically verifying them. This originated as a simple
deduplication tool with the attributes used only as a cache. But I
became intrigued by other uses for file-level hashes, like looking for
a file on a heterogeneous collection of machines by multicasting its
hash, and the aforementioned check for silent corruption. (Yes, I know
btrfs checks automatically, but I won't represent what I'm doing as
anything but purely experimental.)

I've never seen a btrfs scrub produce errors either except very
quickly on one system with faulty RAM, so I was never going to trust
it with real data anyway. (BTW, I believe strongly in ECC RAM. I can't
understand why it isn't universal given that it costs little more.)

I'm beginning to think I should look at some of the less tightly
coupled ways to provide redundant storage, such as gluster.



[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux