Friday, April 28, 2006

 

T2000 (Niagara) Evaluation

We received a Sun T2000 for evaluation in early January. As I covered in an earlier post, we'd developed a model for determining whether we'd see any cost savings from using this new hardware. We now needed to determine how this server would perform with respect to our current hardware choices.

The first application we tested on the T2000 was our in-house SMTP daemon running on our MX servers. (We're a large ISP, so the fact that we receive a lot of mail every day should be a surprise to noone.) This is a threaded application, so it was an excellent candidate for the Niagara CPU (well, okay, the UltraSPARC T1 CPU.) The robust nature of the SMTP transaction was a plus, as we wouldn't lose anything if we happened to push the application beyond it's breaking point. (I should note that we were testing this with live traffic in this case. Some might consider this foolish, but, as I said, this particular application is robust enough that doing so was safe. We certainly wouldn't have done so with other applications.)

We set up the server and started playing with the weights on the load-balancer it sits behind. We safely pushed it up to taking twice the traffic of our current hardware choice (Sun v210's). But the server fell over just below 3x. (Thread count exploded, response time quickly became unacceptable, etc.) I was very unhappy with this, because we'd expected to do much better. I'd been hoping for about 6x. But we weren't maxing out the hardware or the network, so it must have been a software problem. I'll save the details for a later post, but DTrace pointed us to libc's malloc() and free(). After some quick QA running the application with libumem, we tried again.

This time, we did see the performance we were expecting, and then some. We eventually pushed the T2000 up to taking 8x the traffic of one of our current servers. Of course, this was running in four zones, each bound to one of the physical interfaces, so that we wouldn't max out the network. (We only had FastE available at the time. We could have gotten GigE, or we could have aggregated the four interfaces, but using zones was the quickest approach.)

Of course, that 8x figure is an apples-to-oranges comparison, as it compares the app running with libumem to the app running without libumem. On the v210, libumem gave us an 80% performance improvement, so that 8x actually dropped down to a 4.5x. Again, not what I'd been hoping for, but above the threshold where using the T2000 became cost-wise advantageous.


 

T2000 (Niagara) Evaluation (Prelude)

(I started out to write about my experience evaluating the T2000. It turns out that I had a few things to say as a prelude, so I've broken this into multiple posts.)

It appears that at least one person has read my blog, as I have a request to post details about the T2000 evaluation I reently performed. I won't be able to say as much as I'd like to, as some of the information might be considered "material" for SEC purposes, and I'd rather err on the safe side. For example, I'd love to say how much we'd save in operating costs over three years by using the Niagara servers, but that's probably saying too much. But I should be able to say enough to make writing this worthwhile.

Before we received the T2000, there was some discussion of breakeven ratio, i.e., how many of our current servers (e.g., v210's) would we need to replace with a T2000 for it to be worth doing so. The initial conversations took into account nothing more than the price of the hardware, but after a quick whiteboard estimate of space and power savings, I worked up a spreadsheet to determine the breakeven ratio (or to calculate the savings based on the measured ratio, depending on how you look at it.) (And I'll state here that my first attempt at this spreadsheet was a freshman effort. A colleague with more accounting experience reworked it to be what it should be.)

(I'll add here that I'm a little bit embarassed to admit that we hadn't been taking space and power costs into consideration for our earlier hardware purchases. OTOH, there still appear to be quite a few people out there who make the assumption that the cheapest white box they can get is the way to go.)

Once we started looking at the space and power costs for servers, the breakeven ratios for the T2000 vs. our current servers dropped quite a bit. As a purely theoretical example, if we assume that we'll end up paying $16,000 for a T2000, and we're comparing this to an x86 server that we'd pay $2,000 for, the breakeven ratio based purely on hardware cost is 8:1. But if we factor in space and power costs, that ratio falls to 4.2:1. (This example is using real space and power costs, but it assumes that the application currently running on those x86 servers could be moved to a SPARC server with no cost.)

To sum up the above into an obvious statement: It's important to look at more than just the cost of hardware when deciding what hardware to purchase. See more on this here.


Wednesday, April 26, 2006

 

DTrace

I first heard about DTrace almost two years ago now. I was at a Sun event here in New York, and they'd gotten Jarod Jenson to come in and talk about DTrace. This was the only time I've ever seen him, so I don't know if he's like this all the time, but he was unbelievaby energetic. He somehow managed to fit three hours' worth of presentation into an hour and a half or less. It was obvious that he was very excited about this, and he did an excellent job of demonstrating just how useful DTrace could be.

This was before Solaris 10 was officially released and even before DTrace was available in Solaris Express, as I discovered to my disappointment the next day when I installed the currently-available Solaris Express. I played with it some when it did become available, but I was unable to do anything really useful with it until we started using it in production, which we first did when we were evaluating a T2000. I'll detail my successes with DTrace in later posts. I'm hesitant to do so, as the analyses I did with DTrace were close to the simplest things one can do with it. In the end, however, I think that the simplicity itself will speak to the power of DTrace.


Monday, April 24, 2006

 

First post

In the grand tradition of blogging, I'll state that this is my first blog and then ask myself the question, why have I decided to blog.

So why have I decided to blog? I've considered it before, but I've always decided not to. As I see it, nobody's interested in what I have to say. OTOH, that doesn't seem to have stopped millions of others from blogging, so I figured I'd give it a shot. After all, if Dylan could put out records with a voice like that, why couldn't Hendrix?

What am I likely to blog about, assuming I blog at all after this first entry? Mostly technical stuff. I'm not that terribly interested in talking about my personal life in a public forum, nor is the public interested in hearing about my personal life. No matter how cute the new twins are.

Who am I? Given that I'd prefer to focus on technical stuff, I'll answer the part of that question that asks, "What do I do?" I'm a Unix sysadmin. I've been a Unix admin for over ten years now (not counting the two years I spent at Princeton working on my Ph.D. in computer science before deciding it wasn't for me.) I started in the Computer Science Department at the University of Tennessee, where I worked (nominally) part-time while getting my Master's degree and full-time for a couple of years after that before my essay at an academic career. Since then, I've been working the same job for almost six years, first for Juno Online Services, then for United Online, the company formed from the merger of NetZero and Juno.

That's likely enough for now. If I don't follow up on this, it's likely not a big loss.


This page is powered by Blogger. Isn't yours?