I've been running btrfs in RAID1 mode on four 6TB drives for years. They have 35+K hours (4 years) of running time, and while they're still passing SMART scans I I wanted to stop tempting fate. They were also starting to get full (about 92%) and performance was beginning to suffer. My plan: replace them with two new 16TB EXOS (Enterprise) drives from Seagate. My first false start was a "device add" of one of the new drives followed by a "device remove" on an old one. (I'd been a while, and I'd forgotten "device replace"). This went extremely slowly, and by morning it had bombed with a message in the kernel log about running out of space on (I think) the *old* drive. This seemed odd since the new drive was still mostly empty. The filesystem also refused to remount right away, but given the furious drive activity I decided to be patient. The file system mounted by itself an hour or so later. There were plenty of "task hung" messages in the kernel log, but they all seemed to be warnings. No lost data. Whew. By now I remembered "device replace". But I'd already done "device add" on the first new 16 TB drive. That gave me 5 drives online and no spare slot for the second new drive. I didn't want to repeat the "device remove" for fear of another out-of-space failure. So I took a gamble. I pulled one of the old 6TB drives to make room for the second new 16TB drive, brought the array up in degraded mode and started a "device replace missing" operation onto the second new drive. 'iostat' showed just what I expected: a burst of reads from one or more of the three old drives alternating with big writes to the new drive. The data rates were reasonably consistent with the I/O bandwidth limitations of my 10-year-old server. When it finished the next day I pulled the old 6TB drive and replaced it with the second new 16 TB drive. So far so good. I then began another "device replace". Since I wasn't forced to degrade the array this time, I didn't. It's been several days, and it's nowhere near half done. As far as I can tell, it's only making headway of maybe 100-200 GB/day so at this rate it might finish in several weeks! Moreover, when I run 'iostat' I see lots of writes **to** the drive being replaced, usually in parallel with the same amount of data going to one of the other drives. I'd expect lots of *reads from* the drive being replaced, but why are there any writes to it at all? Is this just to keep the filesystem consistent in case of a crash? I'd already run data and metadata balance operations up to about 95%. I hesitate to tempt fate by forcing the system down to do another "device replace missing" operation. Can anyone explain why replacing a missing device is so much faster than replacing an existing device? Is it simply because, without no redundancy left against a drive loss, less work needs to (or can) be done to protect against a crash? Thanks. Phil Karn Here's some current system information. Linux homer.ka9q.net 4.19.0-8-rt-amd64 #1 SMP PREEMPT RT Debian 4.19.98-1 (2020-01-26) x86_64 GNU/Linux btrfs-progs v4.20.1 Label: 'homer-btrfs' uuid: 0d090428-8af8-4d23-99da-92f7176f82a7 Total devices 5 FS bytes used 9.89TiB devid 1 size 5.46TiB used 3.81TiB path /dev/sdd3 devid 2 size 0.00B used 2.72TiB path /dev/sde3 [device currently being replaced] devid 4 size 5.46TiB used 5.10TiB path /dev/sdc3 devid 5 size 14.32TiB used 6.08TiB path /dev/sdb4 devid 6 size 14.32TiB used 2.08TiB path /dev/sda4 Data, RAID1: total=9.84TiB, used=9.84TiB System, RAID1: total=32.00MiB, used=1.73MiB Metadata, RAID1: total=52.00GiB, used=48.32GiB GlobalReserve, single: total=512.00MiB, used=0.00B
