I’m walking down the street in Chicago one sunny day with my kids and this guy and his girlfriend come up to me. He looks me up and down and says, “I’ll bet I know where you got those shoes.” I happened to be wearing a pair of shoes I had recently bought in Beijing and I said, “I doubt it.” He said, wanna bet five bucks?” I confidently said, “Sure,” thinking I was going to lower the dope kitty for this pair. He said, “You got that one on your left foot, and you got that one on your right foot,” and he held out his hand—I was five dollars poorer, and embarrassed that I had lost so much of my once finely tuned street smarts that I didn’t see that coming.
It gets worse.
About a year afterwards, after having a truly fine dinner in San Francisco right around Christmas time, we were walking from the restaurant to the hotel and looking in the festive decorated shop windows, when this skinny guy comes up to me and says, “I’m not looking for a handout, but I’ll bet you I know where you got those shoes.” I was still enjoying the effects of an excellent Cabernet and said, “OK, where?” And, once again, I was five dollars poorer.
“Fool me once, shame on—shame on you. Fool me—you can’t get fooled again.” G.W. Bush, 2002.
Nvidia recently released a new driver, #180.84 by my last count, and with it improved their benchmark scores in certain games. ATI quickly reacted to that, lowered their prices and released their own benchmark results based on their most recent driver #8.12.
We love benchmarks as much as the next dweeb, and we have all the right gear to test things, and we have not only a web site to publicize the results, we have Tech Watch, which you may be reading right now, and we do it for free—no ads here. We get asked by the financial community and the press for our findings, we’re frequently quoted by those folks, and we’re in there with the testing stuff—we dig it and we get it Except that we can’t get it; consistency that is.
We can’t get no, con-sis-ten-cy
We can do a damn fine job with a repeatable, albeit synthetic benchmark like Vantage or 3DMark 06. And we’ve got a couple of game scenarios that we use (and are happy to disclose)—and that’s the rub—GAME TESTS ARE BUNK!
Back when Nvidia introduced the FX30 and it didn’t prove to be too good a part, Nvidia’s strategy was to blame the tool and succeeded in convincing the industry to test with real-world applications like the actual games we want to use. Fine idea, very similar to our own early benchmark tests for Windows back in the mid-eighties (called WITS for you history buffs). So we’re totally pro-game or application testing.
The problem is, there aren’t any. There’s nothing consistent or repeatable by anyone else. If Jerry from Timothy’s Benchmarking Fool web site calls 4D Cracker’s web site and says, Yo Xavier, how about sending me the script you used for testing Zombie Headblasters 101? Xavier will say, “kiss my pixels, do your own script you little maggot.” Remember, these guys are in competition to sell banners to the AIB suppliers—they’re not into sharing.
The game developers have to provide a test scene, as Crytek has done for FarCry and Techland has done for Call of Juarez that anyone can run with FRAPS or a built-in FPS counter. Until that happens you’ve got Balkanization of scores with each web site and AIB supplier using their own scene in a game to get test scores—what we have now.
A second choice would be for an independent to develop scripts for the games, much the way certain dedicated folks produce walkthroughs, and post them (and ask for a donation).
We need something that will allow true scientific testing—i.e., repeatability—in multiple labs.
So what the hell does benchmarking have to with where I got my shoes? It’s a con my friends, plain and simple—make a monkey out of me—fool me twice.
Unless the exact same effects are being invoked every time, no in-game test is valid because it can’t be repeated, scrutinized, or evaluated for relevance. So the in-game scores are not scientific measurements—they are OPINIONS.
Epilog—Why game testing is bogus
Who is going to invest $200 to $500 or more for a graphics AIB just to play one game? If the graphics AIB from Blasphomere gets 65 fps in Stalker and Dismodyne only gets 55, but Blasphomere gets 45 in FarCry while Dismodyne gets 80, and you want to play Left Behinds 92, how will you choose? Wait for the Left Behinds 92 results before you buy a new AIB? Maybe, and then we see that Blasphomere got 25, and Dismodyne got 27. But all you want to do is play Left Behinds 92 and maybe later WOW 75. What’s going to give you a better sense of overall confidence, a repeatable (by you if you so desired) synthetic benchmark, or a totally unrepeatable game test score done by the Socko500 web page with low-low prices on all AIBs and power supplies?
Yet another solution—
a new rating system
How about the game developers or real gamer sites post a score on games that lists the loading on the GPU and the CPU? Then we could purchase a game on the basis of the system we have and/or make the purchase of equipment on the type of games we like to play.