Re: Very long raid5 init/rebuild times

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Jan 22, 2014 at 08:37:49PM -0600, Stan Hoeppner wrote:
> On 1/22/2014 11:48 AM, Marc MERLIN wrote:
> ...
> > If crypt is on top of raid5, it seems (and that makes sense) that no
> > encryption is neded for the rebuild. However in my test I can confirm that
> > the rebuild time is exactly the same. I only get 19MB/s of rebuild bandwidth
> > and I think tha'ts because of the port multiplier.
>
> I didn't address this earlier as I assumed you, and anyone else reading
> this thread, would do a little background reading and realize no SATA
> PMP would behave in this manner.  No SATA PMP, not Silicon Image, not
> Marvell, none of them, will limit host port throughput to 20MB/s.  All
> of them achieve pretty close to wire speed throughput.

I haven't answered your other message, as I'm getting more data to do
so, but I can assure you that this is incorrect :)

I've worked with 3 different PMP boards and three different SATA cards
over the last 6 years (sil3124, 3132, and marvel), and got similarly
slow results on all of them.
The marvel was faster than sil3124 but it stopped being stable in
kernels in the last year and fell unsupported (no one to fix the bugs),
so I went back to sil3124.

I'm not saying that they can't go faster somehow, but in my experience
that has not been the case.

In case you don't believe me, I just switched my drives from the PMP to
directly connected to the motherboard and a marvel card, and my rebuild
speed changed from 19MB/s to 99MB/s.
(I made no other setting changes, but I did try your changes without
saving them before and after the PMP change and will report below)

You also said:
> Ok, now I think we're finally getting to the heart of this.  Given the
> fact that you're doing full array encryption, and after reading your bio
> on your website the other day, I think I've been giving you too much
> credit.  So let's get back to md basics.  Have you performed any md
> optimizations?  The default value of

Can't hurt to ask, you never know if I may have forgotten or not know about one.

> /sys/block/mdX/md/stripe_cache_size
> is 256.  This default is woefully inadequate for modern systems, and
> will yield dreadfully low throughput.  To fix this execute
> ~$ echo 2048 > /sys/block/mdX/md/stripe_cache_size

Thanks for that one.
It made no speed difference on the PMP or without, but can't hurt to do anyway.

> To specifically address slow resync speed try
> ~$ echo 50000 > /proc/sys/dev/raid/speed_limit_min

I had this, but good reminder.

> And you also likely need to increase readahead from the default 128KB to
> something like 1MB (in 512KiB units)
>
> ~$ blockdev --setra 2048 /dev/mdX

I had this already set to 8192, but again, thanks for asking too.

> Since kernel 2.6.23 Linux does on demand readahead, so small random IO
> won't trigger it.  Thus a large value here will not negatively impact
> random IO.  See:  http://lwn.net/Articles/235181/
>
> Please test and post your results.  I don't think your problems have
> anything to do with crypto.  However, after you get md running at peak
> performance you then may start to see limitations in your crypto setup,
> if you have chosen to switch to dmcrypt above md.

Looks like so far my only problem was the PMP.

Thank you for your suggestions though.

Back to my original questions:
> Question #1:
> Is it better to dmcrypt the 5 drives and then make a raid5 on top, or the opposite
> (raid5 first, and then dmcrypt)
> I used:
> cryptsetup luksFormat --align-payload=8192 -s 256 -c aes-xts-plain64 /dev/sd[mnopq]1
 
As you did point out, the array will be faster when I use it because the
encryption will be sharded over my CPUs, but rebuilding is going to create 5 encryption
threads whereas if md5 is first and encryption is on top, rebuilds do
not involve any encryption on CPU.

So it depends what's more important.
 
> Question #2:
> In order to copy data from a working system, I connected the drives via an external
> enclosure which uses a SATA PMP. As a result, things are slow:
> 
> md5 : active raid5 dm-7[5] dm-6[3] dm-5[2] dm-4[1] dm-2[0]
>       15627526144 blocks super 1.2 level 5, 512k chunk, algorithm 2 [5/4] [UUUU_]
>       [>....................]  recovery =  0.9% (35709052/3906881536) finish=3406.6min speed=18939K/sec
>       bitmap: 0/30 pages [0KB], 65536KB chunk
> 
> 2.5 days for an init or rebuild is going to be painful.
> I already checked that I'm not CPU/dmcrpyt pegged.
> 
> I read Neil's message why init is still required:
> http://marc.info/?l=linux-raid&m=112044009718483&w=2
> even if somehow on brand new blank drives full of 0s I'm thinking this could be faster
> by just assuming the array is clean (all 0s give a parity of 0).
> Is it really unsafe to do so? (actually if you do this on top of dmcrypt
> like I did here, I won't get 0s, so that way around, it's unfortunately
> necessary).

Still curious on this: if the drives are brand new, is it safe to assume
t> hey're full of 0's and tell mdadm to skip the re-init?
(parity of X x 0 = 0)

> Question #3:
> Since I'm going to put btrfs on top, I'm almost tempted to skip the md raid5
> layer and just use the native support, but the raid code in btrfs still
> seems a bit younger than I'm comfortable with.
> Is anyone using it and has done disk failures, replaces, and all?

Ok, this is not a btrfs list, so I'll asume no one tried that here, no biggie.

Cheers,
Marc
-- 
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems ....
                                      .... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/                         | PGP 1024R/763BE901
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux