[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Configuration Recommendations

On 04/23/2012 09:56 PM, Jan Nielsen wrote:

The new hardware for the 50GB PG 9.0 machine is:
* 24 cores across 2 sockets
* 64 GB RAM
* 10 x 15k SAS drives on SAN
* 1 x 15k SAS drive local
* CentOS 6.2 (2.6.32 kernel)

This is a pretty good build. Nice and middle-of-the-road for current hardware. I think it's probably relevant what your "24 cores across 2 sockets" are, though. Then again, based on the 24-cores, I have to assume you've got hex-core Xeons of some sort, with hyperthreading. That suggests a higher end Sandy Bridge Xeon, like the X5645 or higher. If that's the case, you're in good hands.

As a note, though... make sure you enable Turbo and other performance settings (disable power-down of unused CPUs, etc) in the BIOS when setting this up. We found that the defaults for the CPUs did not allow processor scaling, and it was far too aggressive in cycling down cores, such that cycling them back up had a non-zero cost. We saw roughly a 20% improvement by forcing the CPUs into full online performance mode.

We are considering the following drive allocations:

* 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG data
* 4 x 15k SAS drives, XFS, RAID 10 on SAN for PG indexes
* 2 x 15k SAS drives, XFS, RAID 1 on SAN for PG xlog
* 1 x 15k SAS drive, XFS, on local storage for OS

Please don't do this. If you have the system you just described, give yourself an 8x RAID10, and the 2x RAID1. I've found that your indexes will generally be about 1/3 to 1/2 the total sixe of your database. So, not only does your data partition lose read spindles, but you've wasted 1/2 to 2/3s of your active drive space. This may not be a concern based on your data growth curves, but it could be.

In addition, add another OS drive and put it into a RAID-1. If you have server-class hardware, you'll want that extra drive. I'm frankly surprised you were even able to acquire a dual Xeon class server without a RAID-1 for OS data by default.

I'm not sure if you've done metrics or not, but XFS performance is highly dependent on your init and mount options. I can give you some guidelines there, but one of the major changes is that the Linux 3.X kernels have some impressive performance improvements you won't see using CentOS 6.2. Metadata in particular has undergone a massive upgrade that drastically enhances its parallel scalability on metadata modifications.

If possible, you might consider the new Ubuntu 12.04 LTS that's coming out soon. It should have the newer XFS performance. If not, consider injecting a newer kernel to the CentOS 6.2 install. And again, testing is the only way to know for sure.

And test with pgbench, if possible. I used this to get our XFS init and mount options, along with other OS/kernel settings. You can have very different performance metrics from dd/bonnie than an actual use pattern from real DB usage. As a hint, before you run any of these tests, both write a '3' to /proc/sys/vm/drop_caches, and restart your PG instance. You want to test your drives, not your memory. :)

kernel.shmall = 4,294,967,296 (commas added for clarity)
kernel.shmax = 68,719,476,736 (commas added for clarity)
kernel.sem = 250 32000 32 128
vm.swappiness = 0
dirty_ratio = 10
dirty_background_ratio = 5

Good. Though you might consider lowering dirty_background_ratio. At that setting, it won't even try to write out data until you have about 3GB of dirty pages. Even high-end disk controllers only have 1GB of local capacitor-backed cache. If you really do have a good SAN, it probably has more than that, but try to induce a high-turnover database test to see what happens during heavy IO. Like, a heavy long-running PG-bench should invoke several checkpoints and also flood the local write cache. When that happens, monitor /proc/meminfo. Like this:

grep -A1 Dirty /proc/meminfo

That will tell you how much of your memory is dirty, but the 'Writeback' entry is what you care about. If you see that as a non-zero value for more than one consecutive check, you've saturated your write bandwidth to the point performance will suffer. But the only way you can really know any of this is with testing. Some SANs scale incredibly well to large pool flushes, and others don't.

Also, make iostat your friend. Particularly with the -x option. During your testing, keep one of these running in the background for the devices on your SAN. Watch your %util column in particular. Graph it, if you can. You can almost build a complete performance profile for different workloads before you put a single byte of real data on this hardware.

If there are "obviously correct" choices in PG configuration, this would
be tremendously helpful information to me. I'm planning on using pgbench
to test the configuration options.

You sound like you've read up on this quite a bit. Greg's book is a very good thing to have and learn from. It'll cover all the basics about the postgresql.conf file. I don't see how I could add much to that, so just pay attention to what he says. :)

Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604


See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email

Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx)
To make changes to your subscription:

[Postgresql General]     [Postgresql PHP]     [PHP Users]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Books]     [PHP Databases]     [Home]     [Yosemite]

Powered by Linux