Benchmarking—it just ain’t fair

Posted: 10.29.10

This editorial is dedicated to one of my heroes—Lewis Black. If you don’t who he is or what he does, start here. And then go to whatever city you have to in order to see him in person, search the TV for old skits, search the web. Then read this.

You can sum up Black’s demeanor and style with three words best abbreviated as WTF?

In all our travels to clients, trade shows, conferences, and various presentations I find myself saying WTF often. Black’s follow up line is, “Do they just think we’re stupid?”

This week’s rant is about benchmarking. Actually it’s more about the AIB suppliers and their whining about benchmarks. No matter which benchmark one uses you can count on getting a whiney to angry email or phone call from one GPU supplier complaining how you didn’t test or treat his or her graphics AIB fairly.

Let’s see, we took two AIBs that are priced about the same, and have similar specifications and ran exactly the same tests—and lots of them, on both AIBs, and one did better than the other—because we did/didn’t do what…? That’s like a car company complaining you drove his car on a road that wasn’t favorable to his car. Do they think we’re stupid?

When we run tests we try to show the results to the companies we’re testing before we publish to make sure we didn’t screw up, miss anything, and are being fair. It helps; the GPU suppliers often spot something we could do or shouldn’t do. And sometimes we get the whining.

This came from an email exchange last week: “… you are using games and benchmarks favorable to our competitor.” What the hell does that mean? The competitor’s AIB will give me a better result and so by testing it and finding that out we’re not being fair? Fair to whom? The poor end user who buys an AIB that doesn’t do well in those games?

Most of you are too young to remember the benchmark wars of the late 80s, and then the benchmark wars of the mid 90s, some of you may remember the benchmark wars of the early 2000s. In the late 80s companies were making their AIBs sensitive to certain applications and/or benchmarks. Well benchmark sensitivity is BS and wrong, everyone agreed and the companies that did it got burned publicly and badly as they deserved to.

But making a driver that makes an application run better on the supplier’s AIB is not cheating. We do this every day in workstation land and get paid extra for it—it’s called tuning and certification.

If I’ve got the choice of buying a graphics AIB that’s tuned for the games I’m interested in why in the world wouldn’t I buy that AIB; because it may not do well on Excel? Screw Excel, I didn’t buy the AIB for Excel I bought it to run “Medal of Honor.”

So if an AIB’s driver is sensitive to the application I care about, and tuned to give max performance to that app, I want to know about it, and I want to reward the supplier for taking the effort to give me a superior experience.

It used to be in late 80s, the mid 90s, and rarely in the 2000s that tuning a driver for an app (or two, or three) would cause the driver to be unstable and contribute to the BSOD, but Microsoft clamped down on that (because they were getting the complaints) and set up WHQL. Over time the GPU suppliers demonstrated their desire and commitment in being good citizens and Microsoft allowed them to self certify. And that’s where we’ve been for the past eight years or more.

Here at JPR we go one step further in our testing. We run a lot of games and get the scores for those games, at various resolutions and with AA on and off, and we show that as raw data. Then we take an average of all those scores to come to a “performance” value for a given AIB at a given resolution.

We use that average performance score in our performance per dollar calculation and charts, and we use it in our Pmark calculations. Generally speaking we don’t use Pmark on the high-end Enthusiast AIBs because power savings is not the criteria one uses when buying such an AIB. But for Midrange and Performance it can be a factor.

So we encourage tuning and over clocking. Who does it hurt? Only the loser, and who cares about him or her?