It’s often hard in this business to draw clear lines separating two vendors’ technologies or products, as often they tend to converge on common solutions, the result of tackling the same problem with the same vision and set of priorities. And while it wouldn’t be right to say the latest generations of GPU technology from Nvidia and AMD are apples and oranges — they aren’t — the two companies are both very consciously differentiating themselves, both with respect to the goals that are shaping their technology decisions and in how they’re packaging up that technology to deploy products.
GPU-compute representing different paths and priorities for the two graphics vendors
Ever since the GPU-compute movement began building grassroots momentum in university labs earlier this decade, we’ve tried to both extol its virtues and temper enthusiasm at the same time. The emergence of GPU-compute has absolutely represented an avenue for then-ATI and Nvidia to expand beyond graphics, first into niche, high-demand, stream-oriented floating-point applications and subsequently into broader-based applications with some genuine mainstream consumer appeal. And looking further beyond, the potential is now there to change the face of computing forever, with AMD and Intel both planning their own takes on the heterogenous platforms of the future, a culmination of which will have had its roots in GPU-compute.
But any new avenue for growth presents risk as well, not just in the sense that that hot new market never quite blossoms as hoped, but more so that in pursuing the path to tomorrow’s riches, one neglects the core markets responsible for paying today’s bills. Translation to the graphics industry: make sure your company’s poised to cash in on whatever new opportunities GPU-compute might uncover, but don’t ever forget that it’s 3D gaming that’s filling the coffers. Not an easy line to walk.
To date, the decisions AMD and Nvidia have made in shaping generation after generation of GPU have shown the two vendors understood the challenge. While both have been vocal advocates of GPU-compute technology, Nvidia has been particularly proactive in fostering its development, understanding what the stumbling blocks have been and working to eliminate them. The company’s hired GPU-compute pioneers (e.g. Ian Buck, Mark Harris) and tasked software engineers with developing a programming model, environment and tools to facilitate application development for GPUs in a manner consistent with CPU coding.
All those things cost money, but they don’t necessarily detract from gaming. And when it came to choosing what would be designed into GPUs like the G70 and even the G80 (the first to more formally address GPU-compute needs), Nvidia wasn’t building any big-ticket items that didn’t directly contribute to first-person shooter frame rates. Throw desperate and self-sufficient GPU-compute developers a bone, but not at the expense of gamers. That’s been the modus operandi of the past few generations, and it’s been a very sensible one.
Double-precision arithmetic’s a great example. Critical to address promising GPU-compute applications like financial modeling, 64-bit floating point offers negligible value for gaming, and to be honest, 3D graphics in general. And a high-performance implementation in silicon means substantial incremental cost over single precision. Now, Nvidia at one time hinted its G92 part would have FP64 in hardware. But then it backed off, most likely because the company figured it just wasn’t worth risking any potential delays in schedule to get it in. And when double-precision did finally show up in the GT200, it came at a significant reduction in performance, so as not to chew up too much die area.
AMD still talks and acts gaming-first, GPU-compute second. With its Evergreen generation and the RV870 GPU, the company’s taking that familiar position. Sure, it’s giving GPU-compute the necessary attention, both in its marketing and its architecture, which bends but doesn’t break for anything but 3D rendering. But when push comes to shove, it’s unequivocally shoving the gaming merits of its technology in the faces of anyone who’ll listen.
Nvidia’s walked a similar walk in the past, but this time seems to be taking a very different — consciously different — tone with Fermi. More willing to stick its neck out to aggressively pursue high-demand non-graphics applications, Nvidia has pushed GPU-compute to the forefront of its marketing campaign. And in doing so, it’s kicked gaming down a notch, if not by intent, then by default.
Compare the tone of the Fermi unveiling with that of the previous GT200 generation. When the GT200 was launched, yes disclosures incorporated a specific thread explaining and pitching the merits of the architecture for GPU-compute. And even the nomenclature used to describe the architecture was two-pronged; there was one set of terminology to describe how the architecture rendered 3D and another to characterize its behavior for more generalized computing applications. And when it came to demonstrating hardware or quantifying performance, cutting-edge 3D games were the obvious choice.
What a difference this time around. Not only was the Fermi unveiling not timed in conjunction with, say, the Game Developer Conference, but done in the company’s keynote for its inaugural GPU Compute Developer conference, an event specifically geared to promote GPU technology to solve problems beyond rendering. Rather than hear about vertices, pixels or texels, we instead had our attention turned to double-precision FLOPs, ECC and C++. Heck, Fermi’s support for DirectX 11 was almost an afterthought we missed completely, particularly telling considering how much we’d think Nvidia would want to take the air out of AMD’s DX 11 superiority. And as a showcase, Nvidia didn’t pick the hottest 3D game du jour; it trotted out Oak Ridge National Laboratories touting a new PetaFLOP supercomputer in the offing planned around Fermi.
And it’s not just in marketing that the marked difference in Nvidia’s posture manifests itself. Where the G80 and GT200 made clear compromises in GPU-compute support, with Fermi the company appears to be much more willing to take on cost and risk to optimize for applications that aren’t simply painting 3D scenes. The laundry list of features included for GPU-compute is long, and we’re not talking just small-ticket items either. Where the GT200’s double-precision support was more of the just-make-it-work-whatever-the-speed variety, Fermi’s runs at just half the speed of single-precision. Where rival AMD chose modest EDC (error detection code) support to flag transmission errors introduced on the GDDR accesses, Nvidia went full bore with across-the-board ECC (single-bit correction) supporting all internal and external memory. Again, there’s stuff in there that’s for the most part meaningless to 3D rendering and games, but it adds cost and complexity. Then throw in a multitude of enhancements, including ISA and address space changes, to support C++, OpenCL and DirectCompute.
It’s hard to quantify, but the incremental risk and cost for GPU-compute in Fermi seemed to cross a line in the sand (silicon?); these are no longer just bones Nvidia’s tossing out to keep GPU-compute advocates just happy enough to keep them in line. There’s some real meat there. And beyond point feature support, it’s the big-picture feel of the technology that hits us; the entire architecture appears shaped with a GPU-compute mindset more than 3D. Before, it felt like we were seeing a 3D rendering architecture that could handle some non-graphics tasks well, but now we’re looking at a general-purpose architecture that still supports 3D. It’s not so subtle a difference.
There are perfectly good explanations for the shift in Nvidia’s posture, particularly in contrast to AMD. As an x86 company with a foot firmly in the CPU business (OK, “firmly” is quite arguable today, but you know what we mean), AMD doesn’t have to stick its neck out on the debate over whether or not the GPU will take big chunks of the CPU’s computing markets. But Nvidia, despite its recent foray into ARM-based Tegra SoCs, isn’t an established CPU vendor and has no x86 IP (at least none we know of and none the company will fess up to). So if it doesn’t want to risk losing significance as an industry player five or ten years from now, it knows it can’t sit on the sidelines to wait and see how this evolving architectural heterogeneity plays out. It’s got to be in the mix, doing whatever it can to ensure a future more line with its strengths than its weaknesses. And the time to do so is now, considering both Intel and AMD begin integrating GPUs into CPUs next year (Intel first with Westmere’s MCM solution).
We’re not saying Nvidia’s turned its back on gaming; the company isn’t dumb. But seems as if it realized it was at the point that in order to secure a prominent seat at the computing table of tomorrow, it had to make a clear-cut choice; no longer can it afford to sit on the fence. Play it more conservative, and stay satisfied milking the discrete GPU market for as long as it can, even if it means the company might have to live with being a second-tier supplier down the road. Or go all-in, throw down as much money, know-how and persistence you can muster behind your vision of where computing should be headed. And then make it happen and damn the consequences. Well, it’s never been Jen-Hsun’s style to be wishy-washy, so all-in it is.
AMD may have backed off on mega-chips, but Nvidia hasn’t
With its last two generations of GPU, including the just-unwrapped RV870 “Cypress” (the first of the Evergreen family), AMD has consciously departed from both companies’ past pattern of GPU design by focusing on a more modest size (thereby cost-effective) die, then using multi-chip designs to scale up to higher performance and price tiers. It’s what AMD calls its “Sweet Spot” strategy.
But while Nvidia could have taken the opportunity to re-think its own historical strategy of building the biggest, baddest chip it could (the GT200 being a great example), it didn’t. It’s gone about designing its next generation Fermi with one goal in mind: ultimate performance, whatever the cost. Yes, more generalized high-performance processing is certainly on its mind, but as we’ve pointed out, the company knows fully well that GeForce and Quadro are paying the bills. It can’t scrimp on GeForce; specifically, it can’t scrimp on GeForce performance.
Nvidia’s GeForce remains the brand of choice for hard-core gamers and computer enthusiasts. Despite the recent inroads AMD has made in price/performance — particularly with the Radeon HD 4800 series — Nvidia has by and large held the performance crown, when it comes to both single and dual GPU models. In its most recent results (as of this writing) from its graphics hardware survey, Steam once again reported Nvidia with a wide lead over AMD. Almost two-thirds of all surveyed indicated they used an Nvidia card, while less than a quarter of users stood by AMD. Nvidia recognizes that winning benchmark titles, not price/performance rankings, is the big reason it’s achieved that status. So there was likely little internal debate as to what Fermi had to accomplish when it’s turn to don the GeForce brand came.
Having been beaten to launch by AMD’s Evergreen family, Nvidia has at least temporarily lost the performance crown it covets (http://www.jonpeddie.com/reviews/comments/testing-the-ati-radeon-hd-5870/). Depending on when Fermi appears on shelves and on the gaming performance it delivers when it arrives — AMD’s going to enjoy touting its performance leadership for at least a good three months or so (and throughout the important holiday 2009 buying season). Nvidia’s going to be anxious to take back the crown, and it likely will, but it’ll have had to sacrifice chip cost (and possibly schedule as well as margins) to do it.
Fermi isn’t going to be small or cheap. How big? Well, Nvidia hasn’t said, but we can take a stab a few ways. For example, where the GT200 implemented 1.4 billion transistors, Fermi pushes the number to an excess of 3 billion. Extrapolating the GT200’s size by transistor growth, and taking into account the process shrink, would bring us to 475 mm2. Or we could extrapolate instead by the number of CUDA cores and again apply the shrink, which would get us again to around 500 mm2. Or finally, 3 billion is about 50% more than the transistor count AMD has advertised for the RV870. Given that AMD’s RV870 sits at 334 mm2 (in a 40 nm process, likely the same that Nvidia is using), we can extrapolate the RV870’s size out to again about 500 mm2. By all counts then, we’re talking about 500 mm2, which would be 50% larger than the RV870, and at a minimum 50% more expensive. In reality, however, and depending on defect density, the premium will likely be substantially higher than that.
No, we don’t have two vendors with the same strategy in mind. AMD’s aiming for aggressive price/performance, while Nvidia is making sure it won’t lose in raw performance. AMD continues to position gaming as its focus, addressing other opportunities that don’t stray too far off the path. Nvidia’s throwing more caution to the wind, betting on a future where the GPU won’t just be important, it will be indispensable, and high-performance 3D will end up being just one of its many compelling applications.
No guts, no glory. More so than any previous generation, Nvidia’s future rests with the success of Fermi. It’s the gauntlet being thrown at the feet of both AMD and Intel, preempting all the hybrid computing solutions the CPU vendors promise, including Westmere, Larrabee and Fusion. If Nvidia wants to bet the farm, Fermi’s a fine way to go about it.
It just better get here soon.