The art of testing and other esoteric sidelines

Posted: 07.22.10

We test a lot of things here at JPR and at JPA before that. We’ve been testing stuff officially and unofficially for 30 some years—you’d think we’d know what we’re doing. Hell, we thought we knew what we were doing.

This last batch of graphics AIBs really threw us. We couldn’t get things to work. Benchmarks wouldn’t bench. Command lines needed to set parameters wouldn’t work. Resolutions would stay stuck, and over-clocking tools defied us to make them behave—and all the while the clock is ticking ….

No one wants to read about a test done on a product that was released weeks ago—no matter how good your insight or analysis is. However, at the same time, we won’t just parrot the reviewer’s guide given out with most new products. No one wants to read that either, although sadly there’s a lot of it on the web.

So we try to make our own test scenarios and find new and interesting stuff—sometimes we succeed. And, we also want to know how these work, what they can and can’t do. We’re a curious lot here—that’s probably one of the things that keeps us running.

Unigine’s Heaven benchmark. (Source: Unigine)

But this week we were defeated—almost. We couldn’t make a spreadsheet work to save our lives (we’re on rev 10 now if you can believe that). And so, we didn’t get the analysis done till Friday instead of Wednesday as we hoped. Doesn’t sound like much, but when you’ve got three people working all day on it the pressure builds up.

So what did we learn? Aside from the fact that time pressure makes good people make bad mistakes? Well we learned (in this particular case) that Nvidia has a pretty damn good board in the GTX460. We almost never look at other reviews until we’ve done ours. Afterwards when we did we found that several web sites agreed with us, only they all said it before us—sigh.

There’s a lot at stake here for AMD and Nvidia in the case of graphics AIBs. The web sites cater mostly to the game enthusiasts. We are read by investors, OEMs, and maybe a few gamers. It used to be thought that a gamer had an influence span of 10 to 15, maybe as high as 20 in mid 2000s. But the market has grown and the gamers aren’t as important as they used to be in influencing opinion. So raw performance isn’t the only criteria anymore. Price of course, and power consumption, and video-photo editing capabilities are equally important in all but the enthusiast segment.

The other thing we have to deal with right (and so does every other AIB evaluator) is the lack of an industry standard. Years ago Nvidia had a mediocre part and it didn’t do well on the industry benchmark 3DMark. So Nvidia, trying to divert attention away from that, went on a crusade to get reviewers to use games rather than synthetic tests. Nvidia was quite successful in that effort and now if you visit any reviewer site you’ll see lots of different games are used for evaluating an AIB. Real world tests are of course a better indication if the reader happens to have or be interested in the games being used for the testing.

But OEMs, and especially those white box OEMs in the east, rely on a standard test and test number for their evaluations and pricing comparisons. Either you run Vantage well or forgetaboutit. But Vantage stops at DirectX 10, and FutureMark won’t have a DirectX 11 benchmark out till September at the earliest. In their absence we, and others, have been using the Unigine engine benchmarks (first was “Sanctuary,” then “Tropics,” and now the current one is called “Heaven.”) These are good, and attractive tests, but they are beyond the demands most games make on the graphics board. You could argue that if you do well in Heaven today you’ll be just fine in the next generation of DirectX 11 games.

So what we did was to take a few DX11 games and Heaven and average their scores at various resolutions to arrive at a generalized figure of merit for the performance quality and capability of a graphics AIB—we in effect made a synthesized test.

A gamer isn’t going to buy an AIB for $200 because it does really well on one game. He or she will want to feel confident that the AIB will perform well in general in not only today’s games but tomorrow’s as well. And that’s what a generic synthesized test will provide, or so we hope. The longevity of a graphics board design is what a good reviewer is expected to expose and highlight. It’s what we try to do. But boy is it hard, thankless work.