Re: Scrubbing with BTRFS Raid 5

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 2014-01-22 at 13:06 -0800, ronnie sahlberg wrote:
+AD4- On Wed, Jan 22, 2014 at 12:45 PM, Chris Mason +ADw-clm+AEA-fb.com+AD4- wrote:
+AD4- +AD4- On Tue, 2014-01-21 at 17:08 +-0000, Duncan wrote:
+AD4- +AD4APg- Graham Fleming posted on Tue, 21 Jan 2014 01:06:37 -0800 as excerpted:
+AD4- +AD4APg-
+AD4- +AD4APg- +AD4- Thanks for all the info guys.
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- I ran some tests on the latest 3.12.8 kernel. I set up 3 1GB files and
+AD4- +AD4APg- +AD4- attached them to /dev/loop+AHs-1..3+AH0- and created a BTRFS RAID 5 volume with
+AD4- +AD4APg- +AD4- them.
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- I copied some data (from dev/urandom) into two test files and got their
+AD4- +AD4APg- +AD4- MD5 sums and saved them to a text file.
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- I then unmounted the volume, trashed Disk3 and created a new Disk4 file,
+AD4- +AD4APg- +AD4- attached to /dev/loop4.
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- I mounted the BTRFS RAID 5 volume degraded and the md5 sums were fine. I
+AD4- +AD4APg- +AD4- added /dev/loop4 to the volume and then deleted the missing device and
+AD4- +AD4APg- +AD4- it rebalanced. I had data spread out on all three devices now. MD5 sums
+AD4- +AD4APg- +AD4- unchanged on test files.
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- This, to me, implies BTRFS RAID 5 is working quite well and I can in
+AD4- +AD4APg- +AD4- fact,
+AD4- +AD4APg- +AD4- replace a dead drive.
+AD4- +AD4APg- +AD4-
+AD4- +AD4APg- +AD4- Am I missing something?
+AD4- +AD4APg-
+AD4- +AD4APg- What you're missing is that device death and replacement rarely happens
+AD4- +AD4APg- as neatly as your test (clean unmounts and all, no middle-of-process
+AD4- +AD4APg- power-loss, etc).  You tested best-case, not real-life or worst-case.
+AD4- +AD4APg-
+AD4- +AD4APg- Try that again, setting up the raid5, setting up a big write to it,
+AD4- +AD4APg- disconnect one device in the middle of that write (I'm not sure if just
+AD4- +AD4APg- dropping the loop works or if the kernel gracefully shuts down the loop
+AD4- +AD4APg- device), then unplugging the system without unmounting... and /then/ see
+AD4- +AD4APg- what sense btrfs can make of the resulting mess.  In theory, with an
+AD4- +AD4APg- atomic write btree filesystem such as btrfs, even that should work fine,
+AD4- +AD4APg- minus perhaps the last few seconds of file-write activity, but the
+AD4- +AD4APg- filesystem should remain consistent on degraded remount and device add,
+AD4- +AD4APg- device remove, and rebalance, even if another power-pull happens in the
+AD4- +AD4APg- middle of /that/.
+AD4- +AD4APg-
+AD4- +AD4APg- But given btrfs' raid5 incompleteness, I don't expect that will work.
+AD4- +AD4APg-
+AD4- +AD4-
+AD4- +AD4- raid5/6 deals with IO errors from one or two drives, and it is able to
+AD4- +AD4- reconstruct the parity from the remaining drives and give you good data.
+AD4- +AD4-
+AD4- +AD4- If we hit a crc error, the raid5/6 code will try a parity reconstruction
+AD4- +AD4- to make good data, and if we find good data from the other copy, it'll
+AD4- +AD4- return that up to userland.
+AD4- +AD4-
+AD4- +AD4- In other words, for those cases it works just like raid1/10.  What it
+AD4- +AD4- won't do (yet) is write that good data back to the storage.  It'll stay
+AD4- +AD4- bad until you remove the device or run balance to rewrite everything.
+AD4- +AD4-
+AD4- +AD4- Balance will reconstruct parity to get good data as it balances.  This
+AD4- +AD4- isn't as useful as scrub, but that work is coming.
+AD4- +AD4-
+AD4- 
+AD4- That is awesome+ACE-
+AD4- 
+AD4- What about online conversion from not-raid5/6 to raid5/6  what is the
+AD4- status for that code, for example
+AD4- what happens if there is a failure during the conversion or a reboot ?

The conversion code uses balance, so that works normally.  If there is a
failure during the conversion you'll end up with some things raid5/6 and
somethings at whatever other level you used.

The data will still be there, but you are more prone to enospc
problems +ADs-)

-chris

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Filesystem Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux