Friday, April 28, 2006

 

T2000 (Niagara) Evaluation

We received a Sun T2000 for evaluation in early January. As I covered in an earlier post, we'd developed a model for determining whether we'd see any cost savings from using this new hardware. We now needed to determine how this server would perform with respect to our current hardware choices.

The first application we tested on the T2000 was our in-house SMTP daemon running on our MX servers. (We're a large ISP, so the fact that we receive a lot of mail every day should be a surprise to noone.) This is a threaded application, so it was an excellent candidate for the Niagara CPU (well, okay, the UltraSPARC T1 CPU.) The robust nature of the SMTP transaction was a plus, as we wouldn't lose anything if we happened to push the application beyond it's breaking point. (I should note that we were testing this with live traffic in this case. Some might consider this foolish, but, as I said, this particular application is robust enough that doing so was safe. We certainly wouldn't have done so with other applications.)

We set up the server and started playing with the weights on the load-balancer it sits behind. We safely pushed it up to taking twice the traffic of our current hardware choice (Sun v210's). But the server fell over just below 3x. (Thread count exploded, response time quickly became unacceptable, etc.) I was very unhappy with this, because we'd expected to do much better. I'd been hoping for about 6x. But we weren't maxing out the hardware or the network, so it must have been a software problem. I'll save the details for a later post, but DTrace pointed us to libc's malloc() and free(). After some quick QA running the application with libumem, we tried again.

This time, we did see the performance we were expecting, and then some. We eventually pushed the T2000 up to taking 8x the traffic of one of our current servers. Of course, this was running in four zones, each bound to one of the physical interfaces, so that we wouldn't max out the network. (We only had FastE available at the time. We could have gotten GigE, or we could have aggregated the four interfaces, but using zones was the quickest approach.)

Of course, that 8x figure is an apples-to-oranges comparison, as it compares the app running with libumem to the app running without libumem. On the v210, libumem gave us an 80% performance improvement, so that 8x actually dropped down to a 4.5x. Again, not what I'd been hoping for, but above the threshold where using the T2000 became cost-wise advantageous.


Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?