On Sat, May 2, 2020 at 12:42 AM Zygo Blaxell <ce3g8jdj@xxxxxxxxxxxxxxxxxxxxx> wrote: > If you use btrfs replace to move data between drives then you get all > the advantages you describe. Don't do 'device remove' if you can possibly > avoid it. But I had to use replace to do what I originally wanted to do: replace four 6TB drives with two 16TB drives. I could replace two but I'd still have to remove two more. I may give up on that latter part for now, but my original hope was to move everything to a smaller and especially quieter box than the 10-year-old 4U server I have now that's banished to the garage because of the noise. (Working on its console in single-user is much less pleasant than retiring to the house and using my laptop.) I also wanted to retire all four 6 TB drives because they have over 35K hours (four years) of continuous run time. They keep passing their SMART checks but I didn't want to keep pushing my luck. > If there's data corruption on one disk, btrfs can detect it and replace > the lost data from the good copy. That's a very good point I should have remembered. FS-agnostic RAID depends on drive-level error detection, and being an early TCP/IP guy I have always been a fan of end-to-end checks. That said, I can't remember EVER having one of my drives silently corrupt data. When one failed, I knew it. (Boy, did I know it.) I can detect silent corruption even in my ext4 or xfs file systems because I've been experimenting for years with stashing SHA file hashes in an extended attribute and periodically verifying them. This originated as a simple deduplication tool with the attributes used only as a cache. But I became intrigued by other uses for file-level hashes, like looking for a file on a heterogeneous collection of machines by multicasting its hash, and the aforementioned check for silent corruption. (Yes, I know btrfs checks automatically, but I won't represent what I'm doing as anything but purely experimental.) I've never seen a btrfs scrub produce errors either except very quickly on one system with faulty RAM, so I was never going to trust it with real data anyway. (BTW, I believe strongly in ECC RAM. I can't understand why it isn't universal given that it costs little more.) I'm beginning to think I should look at some of the less tightly coupled ways to provide redundant storage, such as gluster.
